The invisible (hidden or deep) web is not reached through search engines. Instead you must search directly on the web site where the information exists.
Search engines can not search the whole Internet. What is not reached though search engines, but is still available on the Internet is called the invisible web (also known as the hidden or deep web). In some cases this is due to the fact that search engines do not want to index “everything”. The biggest search engine has approximately 15-20 % coverage.
Among other things, the invisible web consists of:
- Pages without links. Search engines (the spiders) can not reach a page which is not linked.
- Pages that require you to login. Many websites demand login despite having free contents.
- Hidden pages. By simple html-code you can reject the indexing of the search engines.
- Dynamic web sites. Information that exists in databases and is published as an answer to a search query. E.g.:
Reference databases
Fact search tools
Timetables
Weather forecasts
Library catalogues (exception: Yahoo has indexed Open WorldCat, ca. 2 milj. titles)
The invisible web also consists of pages which are not indexed for economical reasons. Search engines index at different depths, sometimes maybe only one or two catalogues. Then the other files are left out. File formats which can be difficult to read, e.g. Flash or sound- and film files without relevant text markup can also be said to be part of the invisible web.
As the Internet develops and we get new services and file formats the capacity of the search engines also increases or new ones are created. There are already search tools which search for and in blogs, RSS-feeds and Podcasts.
Eva Norling
2005-06-22

