Traffic

03Jan06

Pageviews from December 7th, when I added the Google Analytics code to my new WordPress site, thru the 31st:

  • Webalizer v2.01: 71,301
  • Urchin v5.7: 64,711
  • Urchin w/ Referral Filters: 32,605
  • Google Analytics: 8,704
  • Google Adsense: 6,944

March 1st, shortly after I migrated severs, thru December 31:

  • Webalizer v2.01: 476,233
  • Urchin v5.7: 306,052
  • Urchin w/ Referral Filters: 252,143
  • Google Adsense: 93,905

The Webalizer data is raw, whatever the Plesk-generated configuration file defines as a pageview. With Urchin I “normalize” the data to exclude things that aren’t really pageviews: feeds, back-end scripts, robots.txt, etc. In theory Urchin should not be counting anything that the Adsense and Analytics javascripts would be unable to.

Are a huge portion of my visitors disabling or blocking javascript?

Could bots be generating all of this traffic?

Am I on the path to 1 million pageviews in 2006, or is it more like 150k?

TODO: Figure out how to get Urchin Traffic Monitor working under Plesk.

Update: Added Urchin stats with some Referral Filters to elliminate spammer traffic. See the comments. Still have a ton of traffic that Adsense and Analytics can’t account for.

4 Responses to “Traffic”


  1. 1 Phil Ringnalda Posted January 4th, 2006 - 12:37 am

    I wouldn’t be surprised if a fairly large portion of your visitors don’t load JavaScript because they don’t exist other than as zombies trying to (referrer) spam you: take out feeds, crawlers that admit to being crawlers, crawlers that pretend to be browsers, comment spammers who pretend to be browsers, and referrer spammers who pretend to be browsers, and I hardly see any traffic at all.

    Me, I should be in your Urchin and Webalizer stats but not in either Google set: they keep triggering my “there’s no reason for them to be doing that other than their desire to know every single thing that every single internet user does at all times” reflex, and every time they do I drop another URL or three in my hosts file.

  2. 2 Bryce Posted January 4th, 2006 - 8:50 am

    In Urchin I added referral filters for about 45 words that I’m not going to post here lest the Googlebot come along and think I’m gaming them. That eliminated 40% of the unique referrers and I’ve added the pageview stats to the post. Referral stats are unfortunately by session… I checked them down into the single-digit occurrences, which leaves 15k sessions as potential spam. Spot checking suggests it is perhaps 2% of that, insignificant.

    Comment spammers aren’t generally going to show up because I filter out hits on back-end scripts. It’s been over a year since I looked into the spammers’ patterns but at the time their POSTs were never preceded by GETs. My suspicion was that they were using Google instead of directly crawling (see phpBB worm). I’ll save up some comment spam IPs this week and see if that’s still the case.

    On the bots front, Urchin does attempt to exclude their traffic from everything except it’s Robots report. What I don’t know is how it decides what is a bot. In the Browsers report I see many UAs that are clearly not human-driven, including the Adsense bot, but 75% of the raw hits are attributed to IE / Firefox / Safari / Opera.

    So… What else can I filter on?

  3. 3 Bryce Posted January 4th, 2006 - 6:08 pm

    No need to wait for more spammers to stop by. Gather the logs starting with the date I switched to WordPress, grep for POST /wp-comments-post.php, strip the requests from IPs where the comment was approved, then grep each of the spamming IPs.

    Not a whole lot of data given the short timeframe, but it’s clear that the comment spamming bastards have evolved quite a bit from late 2004. All but one of the comment spammers was trying very hard to behave like a real UA, fetching images and stylesheets along the way. One is so realistic that I wondered if it might actually be IE-based… Apparently that one is Internet Business Promoter, the first few hits from that IP contained “IBP” in an otherwise IE-looking UA.

    Some of the comment bots are crawling my site. Others appear to be using Google and Yahoo search results to target keywords. A couple I suspect of targeting outbound links from popular sites.

    I should probably gather IPs from the referral spammers and take a good look at what they are doing, but that sounds like a whole bunch of effort. I got UTM going so the easy thing to do is just wait. The UTM-related data in my raw logs should help in identifying forged UAs that do not go “all the way” and provide a better idea of how (in)accurate Google’s data might be.

  4. 4 Bryce Posted January 5th, 2006 - 7:22 am

    Here’s a thought.

    What if I create a TOS page that says posting messages of a commercial nature or for the purpose of web site promotion cost $5,000 each. Add some checkboxes to the comment form that say “I agree to the TOS” and “This message is not of a commercial or promotional nature.” Maybe add a CAPTCHA too, since a selling point for some of this software is that it will present CAPTCHA-enabled forms to the operator of the program.

    Then I look for the suckers running these programs from their home PCs and file lawsuits in small-claims court demanding my $5,000.

    And once I’ve got a judgement or settlement, file suit against the program’s makers for inducement to breach of contract, contributory trespass, and anything else that may be tenable…

Comments are currently closed.

Valid XHTML 1.0 Transitional

Advertisements

Plugging my Employer

 


Plugging my Employer

Advertisements

Flickr Photos