The current state of the art in comment spam

Write, geek! gets a fair amount of spam replies. This sur­prised me at first, when it began hap­pen­ing almost imme­di­ate­ly after the blog was set up and con­tent was post­ed. I should have known bet­ter; there’s almost no cost to spam­mers in spam­ming even unpop­u­lar blogs, so why would they make an excep­tion for mine?

I’m using the Akismet plu­g­in for Word­Press, so it’s not like any of these com­ments actu­al­ly make it to my blog. In fact, I’d nev­er even have to see them, if not for the fact that I reg­u­lar­ly clean these com­ments out of my spam fold­er by hand. I do this part­ly to ensure that noth­ing legit­i­mate gets fil­tered incor­rect­ly (which hap­pens some­times) and part­ly because I like to sort of keep tabs on the cur­rent ‘state of the art’ in spam­ming.

The cur­rent state of the art in spam­ming is this: the com­ments are get­ting bet­ter. No longer are com­ments jam-packed with dozens of links com­mon­place (one par­tic­u­lar default Word­Press set­ting prob­a­bly made those almost 100% inef­fec­tive), but they’ve been large­ly replaced with com­ments that mas­quer­ade as… actu­al com­ments!

The idea of noise dis­guised as sig­nal is noth­ing new if you’ve used e-mail in the last 15 years, but that the noise is get­ting bet­ter (read: more dif­fi­cult for humans to detect) is some­what sur­pris­ing. Of course, these com­ments are no match for a large, dis­trib­uted sys­tem like Akismet, which all-knowingly sees what’s being post­ed to prob­a­bly mil­lions of blogs, but the well-disguised, large­ly pseudo-flattering com­ments are prob­a­bly now designed to get human blog authors to click the “Not Spam” but­ton, free­ing them the com­ments the spam box so that they can do their SEO-based dirty work.

Of course, gen­tle read­ers, I’m far too smart to fall for that, but not so blind­ed by my hatred for spam to be unable to appre­ci­ate a well-crafted work of author­ship, like this one I just found:

Spam that reads "Excellent read, I just passed this onto a colleague who was doing a little research on that. And he actually bought me lunch because I found it for him smile So let me rephrase that: Thanks for lunch!"

Sure, it’s not per­fect, but some­one out there put some mod­icum of thought into it, which is the least you could ask of the author of a work that’s going to be dis­trib­uted on a mas­sive scale.

Plus, it’s a lot bet­ter than this anti-gem I also just found:

Spam that reads "Why jesus allows this sort of thing to continue is a mystery"

Can you get more unin­ten­tion­al­ly self-referential than that? (No, you can­not… and yes, that was a chal­lenge.)

Upgraded to WordPress 3.0

The old adage (which I think I made up) about spend­ing more time geek­ing around with a Word­Press instal­la­tion than actu­al­ly writ­ing in the damned blog holds true, ladies and gen­tle­men.

I just fin­ished upgrad­ing this fine blog to the newly-stable Word­Press 3.0.

In case you were won­der­ing and/or sit­ting on the edge of your seats, I took great care to:

  1. Dis­able all of my plu­g­ins
  2. Dump a copy of my Word­Press MySQL data­base using the aptly-titled mysqldump
  3. tar a copy of my Word­Press direc­to­ry
  4. Do the upgrade!
  5. Re-enable the plu­g­ins one-by-one, mak­ing sure each works (or at least doesn’t break any­thing)

While I know not every­one is so lucky, I’m glad to see that every­thing appears to work here, because I’d be death­ly embar­rassed if, you know, Google or Bing’s webcrawler came by and things weren’t look­ing up to my usu­al stan­dards.

Why I don’t worry about blog stats, not even a little bit

I don’t obsess over this blog’s traf­fic stats. Doing so would be an exam­ple of kick­ing my own ass.

This graph is unim­por­tant.

So while I use both Google Ana­lyt­ics and the Word­Press Stats plu­g­in, I don’t care a whit about the num­bers. I don’t even have to check them to know that they are mean­ing­less; they’re close enough to zero that they might as well be. (Words I’ve nev­er spo­ken: “I had 12 pageviews today, up from 10. High and to the right, baby!”)

I can’t sep­a­rate bot traf­fic from human traf­fic, and for all I know, I’m prob­a­bly respon­si­ble for some inci­den­tal pageviews… at least if I hap­pen to load pages when not signed in to Word­Press. And why should I care about pageviews, any­way? It’s not like I’m look­ing to sell ads.

So why do I con­tin­ue to use not one, but two solu­tions to not give me num­bers? For the qual­i­ta­tive data. I can’t get enough of those.

My two favorites are as fol­lows: refer­rers and search terms (which are, them­selves, refer­rers, any­way). Both of these give me infor­ma­tion that is actu­al­ly use­ful, right now. Search terms tell me about a case where some­one was look­ing for some­thing and found my post’s title and/or sum­ma­ry promis­ing enough to actu­al­ly click through. And refer­rers, clear­ly, show me who (if any­one) is dri­ving peo­ple my way.

(Even in my past life on Mul­ti­ply, I hooked my account up with Site Meter’s free ser­vice to see if they could show me any insight­ful stats. I took a look through what they offered and found that all I real­ly cared about were the refer­rers… which were, more often than not, hilar­i­ous. Web brows­er, OS and screen res­o­lu­tion can be inter­est­ing for see­ing how my vis­i­tors stack up against Web users as a whole, but what am I going to do with that sort of insight? Fix IE6 CSS issues? Ha.)

The qual­i­ta­tive data that these ser­vices col­lect from my blog have shown me that peo­ple have found my post about the crap­py Viv­i­tar Clip­shot, some even won­der­ing if it’s OS X-compatible. (Hint: it isn’t.) A bunch of dif­fer­ent search terms brought peo­ple to my logo/visual puns post. And one search that didn’t even log­i­cal­ly match up with con­tent I’ve post­ed, recent­ly learned words reap­pear­ing, gives me a great idea for a future post!

Should I be wor­ry­ing more about appeal­ing to the mass­es, or about cre­at­ing the sort of con­tent that peo­ple who actu­al­ly do vis­it are inter­est­ed in? That’s easy. The search­es and refer­rers have shown me that (please cue the schmaltzy music) I’ve touched people’s lives… even if I didn’t nec­es­sar­i­ly give them any­thing of val­ue, and per­haps even wast­ed their time with con­tent that wasn’t rel­e­vant to their inter­ests. I made a dif­fer­ence!

What’s all the PubSubHubBub hubbub?

Gen­er­al­ly speak­ing, I’m a fan of emerg­ing tech­nolo­gies and stuff like that. I just don’t always get it right off the bat.

I first heard of RSS/Atom in 2002 or 2003,  when­ev­er Live­Jour­nal start­ed active­ly push­ing syn­di­ca­tion, mak­ing feeds on jour­nals dis­cov­er­able. I looked upon these alien terms with inter­est, but some con­fu­sion. Wait, I can sub­scribe to a blog? Why would I want to do that?

I know what I prob­a­bly sound­ed like back then. Per­haps in a cou­ple of years, I’ll be laugh­ing at myself, won­der­ing what I’d do with­out Pub­Sub­Hub­Bub. Just per­haps.

For now, though, I’m not quite sure I get it. Since Google Read­er now sup­ports the for­mat, I went ahead and found a Word­Press plu­g­in to enable it here on writegeek. I under­stand that to an RSS sub­scriber, it means faster or near-instantaneous updates. And to a pub­lish­er, it mean not only faster updates for one’s read­ers, but less load on the serv­er, since mil­lions of desk­top feed-readers won’t be reg­u­lar­ly request­ing one’s RSS file. (Not that that applies to me… yet.)

Yeah, I’m a bit intrigued at the instant pub­lish­ing, but have a bunch of unan­swered ques­tions. Which servers should I be ping­ing? What moti­vates one to run a serv­er? What are their busi­ness mod­els? A cou­ple of years down the road, when they real­ize that they’re run­ning the most pop­u­lar servers but still aren’t mak­ing mon­ey, will they be putting ads in my feed? And I think I read some­thing about servers talk­ing to each oth­er; how does that work?

There seems to be noth­ing to lose, no lock-in or sin­gle bas­kets in which to place all of my prover­bial eggs,  so I’ll try it out. (That was basi­cal­ly the point of this post.)

Time to click Pub­lish and start jab­bing my F5 key…

An introduction

Hel­lo, Inter­net. It’s Everett, and I’m blog­ging. I’m sort of new at this.

And at the same time, I’m not.

See, it was 2001 when I first became aware of the fact that peo­ple on the Web were writ­ing reg­u­lar­ly updat­ed, reverse-chronological con­tent about what they had for break­fast. I was a col­lege fresh­man. I took up my key­board and start­ed a blog1 that no longer exists, on a ser­vice that I didn’t like very much (but is still around today).

After a few months there, I start­ed a Live­Jour­nal that exists to this day, but hasn’t been reg­u­lar­ly updat­ed in a num­ber of years. I was once a paid user of Live­Jour­nal, an acknowl­edged con­trib­u­tor to the project and, sim­ply, a humon­gous fan.

Some­thing changed in my life, a few years lat­er, around the time I fin­ished col­lege. Per­haps I no longer felt the need to tell the world what I was hav­ing for break­fast (of course, today that’s Twitter’s job), or maybe my life got a lot less note­wor­thy (if it had ever been). Maybe LiveJournal’s mul­ti­ple changes in own­er­ship tar­nished its image. Or maybe all the cool kids moved on to pure social net­work­ing ser­vices, which were com­ing of age at that point.

It was prob­a­bly a com­bi­na­tion of these things, plus anoth­er big one: I was hired to work in a public-facing role at blogging/social networking/photo sharing/etc. ser­vice extra­or­di­naire Multiply.com. To be clear, Mul­ti­ply didn’t silence me; I made sure I was allowed to con­tin­ue blog­ging else­where before tak­ing the posi­tion. But hav­ing a real job, one that had me among oth­er things, blog­ging, sim­ply wasn’t con­ducive to after-hours blog­ging.

With all of this in the past, I think it’s time I start blog­ging again. Everyone’s cat has a blog, in which they dis­cuss what they ate for break­fast, so why don’t I?

Okay, now I do.

  1. Though I was at the time unaware of the term “blog,” which was by no means in com­mon use in 2001