Hot Topics in Web Search
Web 2.0 is a buzzword that has appeared recently to refer to recent changes in theWeb, notably:
- User-editable content, collaborative work and social networks (e.g., in blogs, wikis such asWikipedia10, and social networkWeb sites like MySpace11 and Facebook12);
- Aggregation of content from multiple sources (e.g., from RSS feeds) and personalization, that is proposed for instance by Netvibes13 or YAHOO! PIPES.
- The deep Web (also known as hidden Web or invisible Web) is the part of Web content that lies in online databases, typically queried through HTML forms. And not usually accessible by following hyperlinks.
- As classical crawlers only follow these hyperlinks, they do not index the content that is behind forms.
- There are hundreds of thousands of such deep Web services. Some of which with very high-quality information: o all Yellow pages directories, information from the U.S. Census Bureau, whether or geo-location services, and so on.
- Classical search engines do not try to extract information from the content of Web pages. They only store and index them as they are.
- This means that the only possible kind of queries that can be asked is keyword queries, and results provided are complete Web pages.
- The purpose of Web information extraction is to provide means to extract structured data. And information from Web pages, so as to be able to answer more complex queries.