Building queries to search only recent commentary appearing in weblogs.
Time was when you needed to find current commentary, you didn’t turn to a full-text search engine like Google. You searched Usenet, combed mailing lists, or searched through current news sites like CNN.com and hoped for the best.
But as search engines have evolved, they’ve been able to index pages more quickly than once every few weeks. In fact, Google tunes its engine to more readily index sites with a high information churn rate. At the same time, a phenomenon called the weblog (http://www.oreilly.com/catalog/essblogging/) has arisen, an online site keeps a running commentary and associated links, updated daily—and indeed, even more often in many cases. Google indexes many of these sites on an accelerated schedule. If you know how to find them, you can build a query that searches just these sites for recent commentary.
When weblogs first appeared on the Internet, they were generally updated manually or by using homemade programs. Thus, there were no standard words you could add to a search engine to find them. Now, however, many weblogs are created using either specialized software packages (like Movable Type, http://www.movabletype.org/, or Radio Userland, http://radio.userland.com/) or as web services (like Blogger, http://www.blogger.com/). These programs and services are more easily found online with some clever use of special syntaxes [Section 1.5] or magic words.
For hosted weblogs, the site:
syntax makes things
easy. Blogger weblogs hosted at
blog*spot (http://www.blogspot.com/) can be found using
site:blogspot.com
. Even though Radio Userland is a
software program able to post its weblogs to any web server, you can
find the majority of Radio Userland weblogs at the Radio Userland
community server (http://radio.weblogs.com/) using
site:radio.weblogs.com
.
Finding weblogs powered by weblog software and hosted elsewhere is
more problematic; Movable Type weblogs, for example, can be found all
over the Internet. However, most of them sport a
“powered by movable type” link of
some sort; searching for the phrase "powered by movable type"
will, therefore, find many of them.
It comes down to magic words typically found on weblog pages, shout-outs, if you will, to the software or hosting sites. The following is a list of some of these packages and services and the magic words used to find them in Google:
"powered by blogger"
or
site:blogspot.com
"powered by blosxom"
"powered by greymatter"
"powered by geeklog"
"a manila site"
or
site:editthispage.com
site:pitas.com
"powered by pmachine"
site:ujournal.org
site:livejournal.com
intitle:"radio weblog"
or
site:radio.weblogs.com
Because you can’t have more than 10 words in a Google query, there’s no way to build a query that includes every conceivable weblog’s magic words. It’s best to experiment with the various words, and see which weblogs have the materials you’re interested in.
First of all, realize that weblogs are usually informal commentary
and you’ll have to keep an eye out for misspelled
words, names, etc. Generally, it’s better to search
by event than by name, if possible. For example, if you were looking
for commentary on a potential strike, the phrase
"baseball
strike"
would be a
better search, initially, than a search for the name of the
Commissioner of Major League Baseball, Bud Selig.
You can also try to search for a word or phrase relevant to the
event. For example, for a baseball strike you could try searching for
"baseball strike" "red sox"
(or "baseball strike" bosox
)—if you’re searching
for information on a wildfire and wondering if anyone had been
arrested for arson, try wildfire arrested
and if
that doesn’t work, wildfire arrested arson
. (Why not search for arson
to
begin with? Because it’s not certain that a weblog
commentator would use the word
“arson.” Instead, he might just
refer to someone being arrested for setting the fire.
“Arrested” in this case is a more
certain word than
“arson.”)