The world's leading source of technology news and analysis
Search Spectrum IEEEXplore Digital Library Submit
Font Size: A A A
IEEE
Home [Alt + 1] Magazine [Alt + 2] Bioengineering [Alt + 3] Computing [Alt + 4] Consumer [Alt + 5] Power/Energy [Alt + 6] Semiconductors [Alt + 7] Communications [Alt + 8] Transportation [Alt + 9]

People Who Read This Article Also Read... Continued By Greg Linden

First Published March 2008
emailEmail PrintPrint CommentsComments ()  ReprintsReprints NewslettersNewsletters

GOOGLE NEWS, arguably the leading site when it comes to personalized news, goes several steps further, automating things as much as ­possible. For example, it uses a technique called implicit personalization to recommend different content to each reader, based on the reader’s past behavior. It’s an innovation that suggests a way forward for the news business. But first, consider how Google News accomplishes two other seemingly simple automation chores: ranking and clustering stories.

Google is, of course, famous for its method of ranking search results. In the case of news, it forms an understanding of which stories are generally the most interesting and important and continually updates a reader’s personal home page with that in mind. Google News collects millions of articles from thousands of sources, so it would be out of the question to use a staff of editors to lay out the front page, as most news sites do. Krishna Bharat, who led the development of Google News, says that its algorithm ranks stories according to the authority of the news source, the timeliness of the article, whether the article is an original piece, where the article was originally placed by the editors on the source Web site, the apparent scope and impact on readers, and the popularity of the article.

To cluster news stories, we have to define “same event”—an ill-defined, surprisingly hard problem

Google News also clusters stories on the same news event. Clustering gives readers the benefit of diversity, which is particularly useful to readers of inter­national news. For example, a French paper might take a profarmer stance when covering a trade dispute on European Union farming subsidies, while a British newspaper might have a very different view. Another advantage of clustering is that it can either eliminate or call explicit attention to duplicate articles, such as when two newspapers run the same Associated Press wire story.

But the task of clustering news stories on the same event encompasses several subchores, some of them fairly difficult. One of them is simply defining what we mean by “same event”—an ill-defined, surprisingly hard problem. For instance, stories about the escape of a tiger from the San Francisco Zoo last December included articles on how the animal may have gotten free, how it killed a visitor, how it mauled two other people, how it was itself killed by police officers. Are they all the same event?

Google News tackles this problem by using a technique called hierarchical agglomerative clustering. Basically, it puts news articles with similar phrasing together into distinct piles. It starts by analyzing the content of articles to find those that share keywords or key phrases; articles that have enough language in common are assumed to be covering similar topics. The articles in each pile are connected based on the strength of their similarity. To visualize these connections, imagine a treelike structure where the articles are the leaves. If we grab a branch from the tree, the many leaves on that branch are all similar articles—that is, articles about the same general event. Thus a group of leaves near one another on a branch of the tree constitutes a cluster.

This tree is constantly changing. As more and more stories accrue on a general event, the threshold for determining whether any two of those stories are about the same aspect of that event becomes higher. The clusters may shift, with articles jumping out to new groups or old groups that are splitting or combining. The groupings adapt to the news available, which is always changing.

If the ideal result is a newspaper featuring the news you want to see, these clustering and ranking strategies can take you only so far. They can determine whether a new development in a story you’ve been following is something that might interest you. But they can’t make a logical leap—for example, recognizing from your previous interest in articles on the search for extra­terrestrial intelligence that you would be fascinated by the discovery of an Earth-like planet in another solar system.


« Previous Page 3 of 6 Next »
emailEmail PrintPrint CommentsComments ()  ReprintsReprints NewslettersNewsletters

MOST POPULAR

Most Read Articles Most Emailed Articles Editor's Pick Articles
Most Read Content

Top 3 most read articles:



WHITE PAPERS

Featured White papers:

More»

White papers:

      More»