Mining the Deep Web
Although search engines like Yahoo!, Bing and Google index billions of web pages and other electronic documents, this represents only a tiny part of the total information available on the World Wide Web. To unearth the buried treasure, you have to understand how to mine the data.Two Layers of Data
Think of the Web as having two layers: a shallow surface and an almost bottomless, deep level. In the top layer, the Surface Web, you will find all the web pages like the one that you're now reading. This page and others like it have fixed web addresses or URLs (in this case, http://www.learnthenet.com/how-to/search-the-deep-web). Also, the information contained in the page doesn't change very often.
The Deep Web contains pages with dynamic content--data that changes frequently and can't be indexed easily by search engines. Most of this information is stored in databases and is assembled "on the fly" when you query the database. For instance, when you search for an item on eBay, information is pulled from eBay's database and instantly assembled on a web page for you. That page did not exist until you performed your search, which is what makes it dynamic; it was customized in response to your query. Because of this fact, search engines can't readily index this information.
Other types of "deep" information include:
- Multimedia (audio, music and video)
- Photos and graphics
- Job listings
- Financial data (stock and bond prices, currency rates)
- Travel-related data (airline and train schedules)
- Information on sites that require passwords