Take a break from text-crawling and check out Google Images (http://images.google.com/), an index of over 390 million images available on the Web. While sorely lacking in special syntaxes [Section 1.5], the Advanced Image Search (http://images.google.com/advanced_image_search) does offer some interesting options.
Of course, any options on the Advanced Image Search page can be expressed via a little URL hacking [Hack #9].
Google’s image search starts with a plain keyword
search. Images are indexed under a variety of keywords, some broader
than others; be as specific as possible. If you’re
searching for cats, don’t use cat
as a keyword unless you don’t mind getting results
that include “cat scan.” Use words
that are more uniquely cat-related, like feline
or
kitten
. Narrow down your query as much as
possible, using as few words as possible. A query like
feline
fang
, which would get
you over 3,000 results on Google, will get you no results on Google
Image Search; in this case, cat
fang
works better. (Building queries for image
searching takes a lot of patience and experimentation.)
Search results include a thumbnail, name, size (both pixels and kilobytes), and the URL where the picture is to be found. Clicking the picture will present a framed page, Google’s thumbnail of the image at the top, and the page where the image originally appeared at the bottom. Figure 2-2 shows a Google Images page.
Searching Google Images can be a real crapshoot, because it’s difficult to build multiple-word queries, and single-word queries lead to thousands of results. You do have more options to narrow your search both through the Advanced Image Search interface and through the Google Image Search special syntaxes.
The Google Advanced Image Search (http://images.google.com/advanced_image_search) allows you to specify the size (expressed in pixels, not kilobytes) of the returned image. You can also specify the kind of pictures you want to find (Google Images indexes only JPEG and GIF files), image color (black and white, grayscale, or full color), and any domain to which you wish to restrict your search.
Google Image search also uses three levels of filtering: none, moderate, and strict. Moderate filters only explicit images, while strict filters both images and text. While automatic filtering doesn’t guarantee that you won’t find any offensive content, it will help. However, sometimes filtering works against you. If you’re searching for images related to breast cancer, Google’s strict filtering will cut down greatly on your potential number of results. Any time you’re using a word that might be considered offensive—even in an innocent context—you’ll have to consider turning off the filters or risk missing relevant results. One way to get around the filterings is to try alternate words. If you’re searching for breast cancer images, try searching for mammograms or Tamoxifen, a drug used to treat breast cancer.
Google Images offers a few special syntaxes:
intitle:
Finds keywords in the page title. This is an excellent way to narrow down search results.
filetype:
Finds pictures of a particular type.
This
only works for JPEG and GIF, not BMP, PNG, or any number of other
formats Google doesn’t index. Note that searching
for filetype:jpg
and
filetype:jpeg
will get you different results,
because the filtering is based on file extension, not some deeper
understanding of the file type.
inurl:
As with any regular Google search,
finds the
search term in the URL. The results for this one can be confusing.
For example, you may search for inurl:cat
and get
the following URL as part of the search result:
www.example.com/something/somethingelse/something.html
Hey, where’s the cat? Because Google indexes the
graphic name as part of the URL, it’s probably
there. If the page above includes a graphic named
cat.jpg
, that’s what Google is
finding when you search for inurl:cat
.
It’s finding the cat in the name of the picture, not
in the URL itself.
site:
As with any other Google web
search, restricts
your results to a specified host or domain. Don’t
use this to restrict results to a certain host unless
you’re really sure what’s there.
Instead, use it to restrict results to certain domains. For example,
search for football
.site:uk
and
then search for football
.
site:com
is a good example of how dramatic a
difference using site:
can make.