This chapter explores the power of Internet search engines for academic work and focuses specifically on Google, since this is often people's first port of call when confronted with a problem, query, or academic task. This chapter shows you how to adopt a clever, strategic approach to your searching so that you can ensure the items you retrieve from Google are suitable for the task at hand. It also goes behind the scenes of Google to explain how search results are ranked and how searches are becoming increasingly personalized for individual users.
The web is not the Internet. Although many people, (including some who should know better), often confuse the two. Neither is Google the Internet, nor Facebook the Internet.
The new generation of Internet filters looks at things you seem to like – the actual things you’ve done, or the things people like you like – and tries to extrapolate. They are predictive engines, constantly creating and refining a theory of who you are and what you’ll do and want next. Together these engines create a unique universe of information for each of us – what I’ve come to call a filter bubble – which fundamentally alters the way we encounter ideas and information.
Table 3.1
Google Specialized Search Operators
filetype: | This is the same as the filetype function in Google Advanced Search. Combining this operator with an extension, such as pdf, doc, or ppt, instructs the search engine to retrieve only items in that particular format. For example, searching for “digital literacy” filetype:doc retrieves only Word documents that are about digital literacy. |
site: | This allows you to search within a particular site or domain. For example, a search string site:thetimes.co.uk instructs the search engine to search only the website of the UK newspaper The Times. |
inurl: | The inurl operator ensures that the search terms are found within a particular URL; for example, a search for inurl:digital means that Google will retrieve results where the word digital appears in the URL of the site. |
allinurl | Where you are entering more than one search term, this operator ensures that they all appear in the URL of the sites retrieved. |
intitle | The intitle operator allows you to specify that your search term should appear in the title of the webpage. |
allintitle: | This is the same as above, but is applied when you are entering more than one search term. |
inanchor: | The anchor text is the clickable text that you see on a webpage, which indicates a hyperlink. With this search operator, you can ensure that your search term appears in the anchor text. |
allinanchor: | As above, but with more than one search term. |
related: | This is a very useful operator when you come across a website that you really like. It allows you to find similar websites. For instance, related:bbc.co.uk searches for websites that are close to the BBC site and retrieves sites such as CNN.com, Reuters.com, and the Guardian website. |
link: | The link: operator allows us, in a way, to apply a PageRank of our own! link: finds all the pages that link to a particular website. This allows us to make judgments about the quality of a website; for example, if many high-quality sites provide incoming links to a website we are interested in, we can draw a reasonable conclusion that the website itself must be of good quality. |
define: | This operator asks Google for definitions of terms, which are gathered from pages on the Web, including Wikipedia. |
info: | The info: operator gives you information about a specific URL, including the cached version of the page, similar pages, and pages that link to the site. |
Table Continued |
cache: | As we described previously, Google gathers information from webpages by “crawling” (following link to link), where it takes a “snapshot” of each page at a particular point in time before indexing the page content. This snapshot that Google takes is also known as the cache, and this operator allows you to retrieve a particular website as it appeared the last time the Googlebot crawled it. You should remember that a page may have changed since it was last crawled. Searching for cached pages can be useful if some information seems to have disappeared from a page since you last viewed it and Google has also not crawled it since then. |