News & Events

UI Researcher Studies Model for Better Web Search Engine Results

While search engines such as Google and Yahoo are effective tools for the common World Wide Web user, there is a desire to improve upon these engines to find pages that relate to a subject of interest. Recently published research by Filippo Menczer, an assistant professor of Management Sciences at the University of Iowa, explores mathematical models that could help engineers of Internet search engines create better ways to hunt down the pages that Web users want to access.

In his paper, "Growing and navigating the small world Web by local content," published in the Oct. 29 issue of Proceedings of the National Academy of Sciences, Menczer examined a sample of 150,000 web pages, studying the relationships between text, links and meaning. He analyzed almost 4 billion pairs of pages with similarities. With this huge body of data, Menczer was able to discover a mathematical power-law relationship between link probability and similarity of language across web pages.

His model is the first to give accurate predictions of Web link structure and growth based on the content of the Web pages. Other Internet models have assumed that a Web page author has knowledge of every Website's popularity, and chooses his links based on that knowledge. But Menczer says authors link to the best and most popular pages within the same category. This creates a small Web between pages with similar topics, like books or a hobby. Menczer's model of this process closely matches what is seen in the real Internet.

This model may help Internet developers gain a better understanding of the evolving structure of the Web and its cognitive and social underpinnings. This may, in turn, lead to more effective authoring guidelines as well as improved ranking, classification, and clustering algorithms used in Web search engines.

"Hopefully, by analyzing the relationship between meanings of a page, links, and words, we will be able to determine how to use these cues to find better search results," said Menczer.

The National Science Foundation (NSF) funds Menczer's research. He is a recipient of one of the NSF's most prestigious awards, the Early Career Development Award, which provides project support over a five-year period. The award recognizes and supports the early career-development activities of those teacher-scholars who are most likely to become the academic leaders of the 21st century.

Menczer and his students developed the MySpiders system, which allows users to launch personal adaptive agents who search the Web on their behalf. Menczer and his Adaptive Agents Research Group at the University of Iowa pursue interdisciplinary research projects in Web, text, and data mining, Web intelligence, distributed information systems, adaptive intelligent agents, evolutionary computation, machine learning, and agent based computational economics.

For a copy of Menczer's PNAS paper, phone University News Services at (319) 384-0012 or send an e-mail to george-mccrory@uiowa.edu.


Return to top of page