Thoughts about Google's ranking
There has been some discussion of late whether Google should ban weblogs from it's default search results because they pollute the results list. The ramblings and heavy interlinking of bloggers, according to some, confuses the innocent searcher and leads him to nonsensical information.
Until yesterday (the 8th of June 2003) I did not agree, I was fairly satisfied with Google's methods. A person who searches Google for a particular topic is presumably interested in information about that subject. Of course there are loads of blogs with inane ramblings out there but most of the ones I read have something to say when they post on a particular subject. As such the results I would get from them on Google would interest me as a searcher and I wouldn't want to have them disappear to a different tab or be excluded automatically, only to be included when I tick a box in the preferences.
I said until yesterday because yesterday Google weblog posted this:
We're Number One!
It has been brought to our attention that we're the first hit for "weblog" on Google. Wow!
Now this gets interesting: Google weblog is a weblog, it says so in the title. It is about Google. It's not about weblogs. It doesn't have anything to say about weblogs, it isn't even a particularly good example of a weblog, it's way too specialized.
This illustrates a fundamental flaw with Google. Google's ranking is a closed system. A black box and we have no way to determine whether the algorithm in the box is alive or brain-dead.
Let's take a look at the top search-terms by which visitors find this site (95% of my search engine referrals come from Google):
Now for the most part these terms are ok, some page in this site is about the term. Half of the terms even have two keywords, as someone who works in a library I can approve of this. There is a problem with items 5 (english term: schizoid), 11 (english term: nether-world) and 12 (english term: centipede). These terms are to be found in the section that contains some of the absurd writing from my past I have put online. The words are in the documents, the pages are not ABOUT the subject though. I wonder very much whether someone searching for information about centipedes is really interested in my ramblings about some fictional character who has a centipede living in his ear for example. Wouldn't they rather see a page from a biology department or an online encyclopedia with nice pictures and some information about the habitat of different species of centipede?
This gets worrying. It seems that Google leads people to inappropriate content. Part of this is of course due to the fact that Google is not a natural language processor. Its ranking system is also not done on a case by case basis by a human skilled in the art of cataloguing.
Recently Google stopped paying attention to the keywords metatag, the reason given was that people often forgot to put them in anyway. This is a silly argument of course, the fact that loads of people just drop litter on the sidewalk is no reason to abandon public litterbins. Sure I like to get a lot of visitors but what use is it to me if these people only go away immediately, frustrated.
Now lets be honest, a search engine isn't a library catalog. The goal of a library catalog is to provide a concise summary of a book so that a patron can determine what it's about and whether it would be useful to look at the book on the shelf and possibly borrow it. A catalog is meant to describe what a book is about and where it can be found, it is not meant to lead a searcher to a particular book. In this respect searchengines are different. A lot of the terms entered in searchengines are actually fully valid URL's. Why would someone search for www.hotmail.com when they could just as easily type it into the locationbar? I blame poor UI design on the part of the browser and the fact that webpages become the browser to most people. A webbrowser window is no longer a representation of a document inside a UI, but that is another subject.
I believe that Google should start paying attention to metatags right away. I also propose that pages with good metatags should get a higher ranking. Google has such a huge marketshare that it could easily leverage it's position to better the web.
And it should immediately drop Google weblog from the number one spot when searching for the term weblog. This ranking illustrates that Google's algorithm is flawed to an extremely large extent.
In conclusion I don't think Google should exclude weblogs from it's default search results. It justs needs to rethink it's method of ranking pages, heavy interlinking could be a problem, but I think it's parser should attempt to understand natural language a bit better. Also: metatags were invented for a reason, they provide structure and more or less machine readable data. Use it if it's there.