Use inurl: syntax to search site subdirectories.
The site:
special
syntax is perfect for those situations in which you want to restrict
your search to a certain domain or domain suffix like
“example.com,”
“www.example.org,” or
“edu”: site:edu
.
But it breaks down when you’re trying to search for
a site that exists beneath the main or default site (i.e., in a
subdirectory like /~sam/album/
).
For example, if you’re looking for something below
the main GeoCities site, you can’t use
site:
to find all the pages in
http://www.geocities.com/Heartland/Meadows/6485/;
Google will return no results. Enter inurl:
, a
Google special syntax [Section 1.5] for specifying a string to be found in a
resultant URL. That query, then, would work as expected like so:
inurl:www.geocities.com/Heartland/Meadows/6485/
While the http://
prefix in a URL is summarily
ignored by Google when used with site:
, search
results come up short when including it in a
inurl:
query. Be sure to remove prefixes in any
inurl:
query for the best (read: any) results.
You’ll see that using the inurl:
query instead of the site:
query has two immediate
advantages:
You can also use inurl:
in combination with the
site:
syntax to get information about
subdomains. For example, how many
subdomains does O’Reilly.com really have? You
can’t get that information via the query
site:oreilly.com
, but neither can you get it just
from the query inurl:"*.oreilly.com"
(because that
query will pick up mirrors and other pages containing the string
oreilly.com that aren’t at the
O’Reilly site).
However, this query will work just fine:
site:oreilly.com inurl:"*.oreilly" -inurl:"www.oreilly"
This query says to Google, “Look on the site O’Reilly.com with page URLs that contain the string `*.oreilly’ (remember the full-word wildcard? [Hack #13]) but ignore URLs with the string `www.oreilly’” (because that’s a subdomain you’re already very familiar with).