A National Lampoon magazine cover from 1973 which was discussed quite controversial [Source: Wikipedia]Contrary to popular belief – and contrary to the recently started controversial discussions about “Catvertising” and “Kittywood” (more on that topic a bit further down in this post) it is a widely ignored fact that searches for <<dog>> or <<“dog video”>> outnumber the searches for <<cat>> or <<“cat video”>>.

So: why the hell is everybody talking about “Cats on the Internet” then?

Certainly: Lolcats. A tremendously successful Internet meme from 2007, featuring photos of predominantly cute cats with image captions that did bend the rules of orthography and syntax quite far (aka “lolspeak”. By now even academic papers are written about that topic). And this meme was quite successful, too: the search term <<I can has>> [this notation means: a search for all terms, no quotes involved] lists 8.740.000.000 results in Google. The same phrase search with quotes [notation: <<“I can has”>>, you have gotten the idea by now anyway, haven’t you?] brings up 44.300.000 results.

The according web site “I can has cheezburger” gets about 18.200.000 results , its dog equivalent “I can has hotdog” lists about 58.100.000 results in Google. Funnily enough, the domain name for the cat version is icanhascheezburger.com, while the dog equivalent lists under dogs.icanhascheezburger.com.

Nevertheless – as the recent talk about “catvertising”, “repurrters”, and “kittywood” properly illustrates (here’s an introductory read about the topic from no less than TIME Magazine (opens in a new window)), journalists enjoy talking about “Cats on the Internet”. And even the “Kittywood” vs. “Catvertising” copycat controversy [pun intended, really!] can be properly resolved in time (try it here: Google Trends: Kittywood vs. Catvertising.

The exact count difference for “dog vs. cat” is not depicted below, but as these graphs from Google Trends show, the difference in the relative amount of searches is quite significant:

Google Trends: world-wide searches for "cat" vs. "dog"

One thing that is often mentioned when the question about the amazing popularity of cat videos on YouTube is mentioned/discussed is “But they are soooooooo cute”. Well – they certainly are. But that goes for puppy videos as well, as the following checks across Google for different search terms do show.

Combining the attribution of “cute” with the category searches of “puppy” or “kitten” repeats the findings for the initial category searches: around 5 million results for <<“cute puppy”>> vs. a bit over 3 million for <<“cute kitten”>>.
A side note: From the standpoint of a serious Web Analyst it is reasonable to assume that the success of the “cute [whatever]” search lies particularly in the exact match of a user’s intent with the delivered Search Engine results.

It is no surprise that search syntactics do matter a lot here in a stochastic sense: as we can expect, a search for a single search term returns a load of results: <<kitten>> produces 138.000.000 results, <<puppy>> produces 239.000.000 results. The kitten-to-puppy result ratio for the generic terms is quite similar to the one for the more specific terms <<cat movie>> and <<dog movie>>: 55.5% to 57.7%.
The term <<cute kitten>> returns 20.700.000 results (15% of the one-term search); the exact phrase search <<“cute kitten”>> returns 3.140.000 (2.2% of the one-term search).
For the puppies, it looks similar: 39.500.000 for <<cute puppy>> (that is 16.5% of the one-term-search) and 4.940.000 for <<“cute puppy”>> (that makes 2.1% of the initial one-term-search). So: the breakdowns are nearly identical, although the original volumes do differ quite a lot.

Using a more specific and complex search query showed some surprises, though: <<cute kitten sound>> brought up 30.800.000 results, the plural form <<cute kitten sounds>> returned about 1/3 of that (10.800.000), a search for the exact phrase/term combination <<“cute kitten” sound>> returned 3.740.000 results (12.1%), while the exact phrase <<“cute kitten sound”>> returned 18.000 results (0.05%). Interestingly enough the plural form <<“cute kitten sounds”>> gave only slightly more results: 26.500 (0.08%).
Testing the same with the “puppy” query: results (gasp!) for the string <<cute puppy sound>>, but only 3.940.000 for the plural form (0.3%). 4.880.000 for the phrase/term combination <<“cute puppy” sound>> (0.4%), while <<“cute puppy sound”>> returned only 5.320 results (that is literally nothing that could be helpfully expressed as a percentage any more), <<“cute puppy sounds”>> returned 110.000 results (about 20 times as many as for the singular form of the search query. But still a value pretty close to nothing).

What do we learn from all this (besides that these numbers are huge)?
1. Puppies and dogs are more prominent on the Internet than cats and kittens. Except in articles about fluffiness and cuteness, and in relation to advertising agencies exploiting wordplay (“Catvertising”, “Repurrters”).
2. The neighbouring occurrence of the terms “cute”, “puppy”, and “sound” in documents on the Internet is A LOT bigger than that of “cute”, “kitten”, and “sound” (about 42 times as many returns for the former). This is heavily watering down relevant document retrieval for loosely coupled syntactic elements on target pages (worth a separate post).
3. The average amount of kittens depicted in “cute kitten(s)” images is close to 1.04. Approximately. I guess. Hm. Directly derived from that:
4. It seems that the predominant cultural meme of “cats = freaky, but fluffy and cute individualistic creatures” is mirrored on Search Engine Result volumes as well: about 1.1 million findings from an image search for <<“cute kitten”>> , but only 331.000 for <<“cute kittens”>> (that makes a singular-to-plural rate of 29.5%, as opposed to 70.9% for the “cute puppy” case).
5. The old idea that search term combinations would be primarily handled based on elementary Boolean Operators has become totally naught by now. Although the difference between a search for terms (the query is made without the quotes) and for a phrase (the sequence of search terms starts and ends with quotes) is still valid, the Google search is both a lot more capable of analyzing AND a lot more dependent on the existence of strong syntactic ties in documents on the web.
Elaborating on that thought surely is worth another post, scheduled for early 2012.

In case you couldn’t care less, or feel inclined to bridge the waiting time: why not go and watch some cute kitten videos in the meantime?