USS Clueless - Dystopian Xanadu
     
     
 

Stardate 20020521.1559

(On Screen): Chris suggests the possibility of a new way of indexing the web, where web users install a plugin and actively contribute information to a database:

When an ant is exploring it doesn't lay down a scent trail, and neither should web surfers (due to spyware or otherwise). But we propose a new breed of tracking software that could be plugged-in, or built-in to the next generation of web browsers: when a surfer finds a page of interest they could summon their personal browsing history, check-off the pages they think are related to it, and then go on their merry way. The voluntary electronic scent trail left behind could be exploited by a third party—maybe a search engine—that builds a new navigation layer on top of the existing web. Any other Web citizen who follows the same path will reinforce it anonymously.

Alas, it wouldn't work. Like all idealists, Chris assumes that everyone involved would play fair.

HTML currently permits the use of keywords in a META tag as part of the page header. One is permitted to place words and phrases there which relate to the content of the page, for the benefit of search engines trying to index the site. None of the important search engines pay attention to them anymore because of rampant abuse by sex sites.

For a while, you could dump the source of some porn sites and look at their meta keyword lists and find all manner of things completely unrelated to what the page actually contained (pictures of nekkid wimmen) solely for purposes of trying to get their sex page to show up in the midst of searches for other kinds of sites. At a certain point the pool of meta keywords became so polluted that noise drowned out signal. Darwinian selection then kicked in: search engines still paying attention to them would turn up huge numbers of bogus hits; those who ignored them were better and became more successful.

The Google people have been involved in an unpublicized running battle with porn sites who try to figure out ways to boost their relevance in Google's search results. One trick is to create a whole hell of a lot of small sites which cross-link each other and all of which link to one main page which the porn-spammer is trying to get boosted. They then submit a few of these to the search engine, which finds the others by spidering, and it makes it look like a lot of places link to the primary site for a certain subject matter. For a while it worked; later Google's engineers got wise and started paying more attention to things like the fact that all of the hundreds of sites were under a small handful of actual primary URLs and thus belonged to the same people. It's been an arms race where Google's engineers have been challenged to filter out the sites which are truly relevant from those who would like to be seen as relevant even though they aren't, just to drive traffic.

There's no reason to believe that this kind of thing wouldn't also happen with this proposed mechanism. Chris's idea would work beautifully only as long as everyone contributing to it played fair. But how hard would it be for someone to analyze the communication protocol this plugin used, and started flooding the central index site with bogus irrelevant "relevant" links? "Hey; 95,000 people just told us that they thought that Newsweek was strongly related to Shirley's Randy Barnyard Friends." Or one liar did, 95,000 times...


include   +force_include   -force_exclude

 
 
 

Main:
normal
long
no graphics

Contact
Log archives
Best log entries
Other articles

Site Search