The current state of the art in comment spam

Write, geek! gets a fair amount of spam replies. This sur­prised me at first, when it be­gan hap­pen­ing al­most im­me­di­ate­ly af­ter the blog was set up and con­tent was post­ed. I should have known bet­ter; there’s al­most no cost to spam­mers in spam­ming even un­pop­u­lar blogs, so why would they make an ex­cep­tion for mine?

I’m us­ing the Akismet plu­g­in for WordPress, so it’s not like any of these com­ments ac­tu­al­ly make it to my blog. In fact, I’d nev­er even have to see them, if not for the fact that I reg­u­lar­ly clean these com­ments out of my spam fold­er by hand. I do this part­ly to en­sure that noth­ing le­git­i­mate gets fil­tered in­cor­rect­ly (which hap­pens some­times) and part­ly be­cause I like to sort of keep tabs on the cur­rent ‘state of the art’ in spamming.

The cur­rent state of the art in spam­ming is this: the com­ments are get­ting bet­ter. No longer are com­ments jam-packed with dozens of links com­mon­place (one par­tic­u­lar de­fault WordPress set­ting prob­a­bly made those al­most 100% in­ef­fec­tive), but they’ve been large­ly re­placed with com­ments that mas­quer­ade as… ac­tu­al comments!

The idea of noise dis­guised as sig­nal is noth­ing new if you’ve used e-mail in the last 15 years, but that the noise is get­ting bet­ter (read: more dif­fi­cult for hu­mans to de­tect) is some­what sur­pris­ing. Of course, these com­ments are no match for a large, dis­trib­uted sys­tem like Akismet, which all-knowingly sees what’s be­ing post­ed to prob­a­bly mil­lions of blogs, but the well-disguised, large­ly pseudo-flattering com­ments are prob­a­bly now de­signed to get hu­man blog au­thors to click the “Not Spam” but­ton, free­ing them the com­ments the spam box so that they can do their SEO-based dirty work.

Of course, gen­tle read­ers, I’m far too smart to fall for that, but not so blind­ed by my ha­tred for spam to be un­able to ap­pre­ci­ate a well-crafted work of au­thor­ship, like this one I just found:

Spam that reads "Excellent read, I just passed this onto a colleague who was doing a little research on that. And he actually bought me lunch because I found it for him smile So let me rephrase that: Thanks for lunch!"

Sure, it’s not per­fect, but some­one out there put some mod­icum of thought in­to it, which is the least you could ask of the au­thor of a work that’s go­ing to be dis­trib­uted on a mas­sive scale.

Plus, it’s a lot bet­ter than this anti-gem I al­so just found:

Spam that reads "Why jesus allows this sort of thing to continue is a mystery"

Can you get more un­in­ten­tion­al­ly self-referential than that? (No, you can­not… and yes, that was a challenge.)

Upgraded to WordPress 3.0

The old adage (which I think I made up) about spend­ing more time geek­ing around with a WordPress in­stal­la­tion than ac­tu­al­ly writ­ing in the damned blog holds true, ladies and gentlemen.

I just fin­ished up­grad­ing this fine blog to the newly-stable WordPress 3.0.

In case you were won­der­ing and/or sit­ting on the edge of your seats, I took great care to:

  1. Disable all of my plu­g­ins
  2. Dump a copy of my WordPress MySQL data­base us­ing the aptly-titled mysql­dump
  3. tar a copy of my WordPress directory
  4. Do the upgrade!
  5. Re-enable the plu­g­ins one-by-one, mak­ing sure each works (or at least doesn’t break anything)

While I know not every­one is so lucky, I’m glad to see that every­thing ap­pears to work here, be­cause I’d be death­ly em­bar­rassed if, you know, Google or Bing’s we­bcrawler came by and things weren’t look­ing up to my usu­al standards.

Why I don’t worry about blog stats, not even a little bit

I don’t ob­sess over this blog’s traf­fic stats. Doing so would be an ex­am­ple of kick­ing my own ass.

This graph is unimportant.

So while I use both Google Analytics and the WordPress Stats plu­g­in, I don’t care a whit about the num­bers. I don’t even have to check them to know that they are mean­ing­less; they’re close enough to ze­ro that they might as well be. (Words I’ve nev­er spo­ken: “I had 12 pageviews to­day, up from 10. High and to the right, baby!”)

I can’t sep­a­rate bot traf­fic from hu­man traf­fic, and for all I know, I’m prob­a­bly re­spon­si­ble for some in­ci­den­tal pageviews… at least if I hap­pen to load pages when not signed in to WordPress. And why should I care about pageviews, any­way? It’s not like I’m look­ing to sell ads.

So why do I con­tin­ue to use not one, but two so­lu­tions to not give me num­bers? For the qual­i­ta­tive da­ta. I can’t get enough of those.

My two fa­vorites are as fol­lows: re­fer­rers and search terms (which are, them­selves, re­fer­rers, any­way). Both of these give me in­for­ma­tion that is ac­tu­al­ly use­ful, right now. Search terms tell me about a case where some­one was look­ing for some­thing and found my post’s ti­tle and/or sum­ma­ry promis­ing enough to ac­tu­al­ly click through. And re­fer­rers, clear­ly, show me who (if any­one) is dri­ving peo­ple my way.

(Even in my past life on Multiply, I hooked my ac­count up with Site Meter‘s free ser­vice to see if they could show me any in­sight­ful stats. I took a look through what they of­fered and found that all I re­al­ly cared about were the re­fer­rers… which were, more of­ten than not, hi­lar­i­ous. Web brows­er, OS and screen res­o­lu­tion can be in­ter­est­ing for see­ing how my vis­i­tors stack up against Web users as a whole, but what am I go­ing to do with that sort of in­sight? Fix IE6 CSS is­sues? Ha.)

The qual­i­ta­tive da­ta that these ser­vices col­lect from my blog have shown me that peo­ple have found my post about the crap­py Vivitar Clipshot, some even won­der­ing if it’s OS X-compatible. (Hint: it isn’t.) A bunch of dif­fer­ent search terms brought peo­ple to my logo/visual puns post. And one search that didn’t even log­i­cal­ly match up with con­tent I’ve post­ed, re­cent­ly learned words reap­pear­ing, gives me a great idea for a fu­ture post!

Should I be wor­ry­ing more about ap­peal­ing to the mass­es, or about cre­at­ing the sort of con­tent that peo­ple who ac­tu­al­ly do vis­it are in­ter­est­ed in? That’s easy. The search­es and re­fer­rers have shown me that (please cue the schmaltzy mu­sic) I’ve touched people’s lives… even if I didn’t nec­es­sar­i­ly give them any­thing of val­ue, and per­haps even wast­ed their time with con­tent that wasn’t rel­e­vant to their in­ter­ests. I made a difference!

What’s all the PubSubHubBub hubbub?

Generally speak­ing, I’m a fan of emerg­ing tech­nolo­gies and stuff like that. I just don’t al­ways get it right off the bat.

I first heard of RSS/Atom in 2002 or 2003,  when­ev­er LiveJournal start­ed ac­tive­ly push­ing syn­di­ca­tion, mak­ing feeds on jour­nals dis­cov­er­able. I looked up­on these alien terms with in­ter­est, but some con­fu­sion. Wait, I can sub­scribe to a blog? Why would I want to do that?

I know what I prob­a­bly sound­ed like back then. Perhaps in a cou­ple of years, I’ll be laugh­ing at my­self, won­der­ing what I’d do with­out PubSubHubBub. Just perhaps.

For now, though, I’m not quite sure I get it. Since Google Reader now sup­ports the for­mat, I went ahead and found a WordPress plu­g­in to en­able it here on writegeek. I un­der­stand that to an RSS sub­scriber, it means faster or near-instantaneous up­dates. And to a pub­lish­er, it mean not on­ly faster up­dates for one’s read­ers, but less load on the serv­er, since mil­lions of desk­top feed-readers won’t be reg­u­lar­ly re­quest­ing one’s RSS file. (Not that that ap­plies to me… yet.)

Yeah, I’m a bit in­trigued at the in­stant pub­lish­ing, but have a bunch of unan­swered ques­tions. Which servers should I be ping­ing? What mo­ti­vates one to run a serv­er? What are their busi­ness mod­els? A cou­ple of years down the road, when they re­al­ize that they’re run­ning the most pop­u­lar servers but still aren’t mak­ing mon­ey, will they be putting ads in my feed? And I think I read some­thing about servers talk­ing to each oth­er; how does that work?

There seems to be noth­ing to lose, no lock-in or sin­gle bas­kets in which to place all of my prover­bial eggs,  so I’ll try it out. (That was ba­si­cal­ly the point of this post.)

Time to click Publish and start jab­bing my F5 key…

An introduction

Hello, Internet. It’s Everett, and I’m blog­ging. I’m sort of new at this.

And at the same time, I’m not.

See, it was 2001 when I first be­came aware of the fact that peo­ple on the Web were writ­ing reg­u­lar­ly up­dat­ed, reverse-chronological con­tent about what they had for break­fast. I was a col­lege fresh­man. I took up my key­board and start­ed a blog1 that no longer ex­ists, on a ser­vice that I didn’t like very much (but is still around today).

After a few months there, I start­ed a LiveJournal that ex­ists to this day, but hasn’t been reg­u­lar­ly up­dat­ed in a num­ber of years. I was once a paid user of LiveJournal, an ac­knowl­edged con­trib­u­tor to the project and, sim­ply, a hu­mon­gous fan.

Something changed in my life, a few years lat­er, around the time I fin­ished col­lege. Perhaps I no longer felt the need to tell the world what I was hav­ing for break­fast (of course, to­day that’s Twitter’s job), or maybe my life got a lot less note­wor­thy (if it had ever been). Maybe LiveJournal’s mul­ti­ple changes in own­er­ship tar­nished its im­age. Or maybe all the cool kids moved on to pure so­cial net­work­ing ser­vices, which were com­ing of age at that point.

It was prob­a­bly a com­bi­na­tion of these things, plus an­oth­er big one: I was hired to work in a public-facing role at blogging/social networking/photo sharing/etc. ser­vice ex­tra­or­di­naire To be clear, Multiply didn’t si­lence me; I made sure I was al­lowed to con­tin­ue blog­ging else­where be­fore tak­ing the po­si­tion. But hav­ing a re­al job, one that had me among oth­er things, blog­ging, sim­ply wasn’t con­ducive to after-hours blogging.

With all of this in the past, I think it’s time I start blog­ging again. Everyone’s cat has a blog, in which they dis­cuss what they ate for break­fast, so why don’t I?

Okay, now I do.

  1. Though I was at the time un­aware of the term “blog,” which was by no means in com­mon use in 2001