A hack that runs a query against some of the available Google API specialty topics.
Google doesn’t talk about it much, but it does make specialty web searches available. And I’m not just talking about searches limited to a certain domain. I’m talking about searches that are devoted to a particular topic. The Google API makes four of these searches available: The U.S. Government, Linux, BSD, and Macintosh.
In this hack, we’ll look at a program that takes a query from a form and provides a count of that query in each specialty topic, as well as a count of results for each topic. This program runs via a form.
Why would you want to topic search? Because Google currently indexes over 3 billion pages. If you try to do more than very specific searches you might find yourself with far too many results. If you narrow your search down by topic, you can get good results without having to exactly zero in on your search.
You can also use it to do some decidedly unscientific research. Which
topic contains more iterations of the phrase “open
source”? Which contains the most pages from
.edu
(educational) domains? Which topic,
Macintosh or FreeBSD, has more on user interfaces? Which topic holds
the most for Monty Python fans?
#!/usr/local/bin/perl # gootopic.cgi # Queries across Google Topics (and All of Google), returning # number of results and top result for each topic. # gootopic.cgi is called as a CGI with form input # Your Google API developer's key my $google_key='insert key here'; # Location of the GoogleSearch WSDL file my $google_wdsl = "./GoogleSearch.wsdl"; # Google Topics my %topics = ( '' => 'All of Google', unclesam => 'U.S. Government', linux => 'Linux', mac => 'Macintosh', bsd => 'FreeBSD' ); use strict; use SOAP::Lite; use CGI qw/:standard *table/; # Display the query form print header( ), start_html("GooTopic"), h1("GooTopic"), start_form(-method=>'GET'), 'Query: ', textfield(-name=>'query'), ' ', submit(-name=>'submit', -value=>'Search'), end_form( ), p( ); my $google_search = SOAP::Lite->service("file:$google_wdsl"); # Perform the queries, one for each topic area if (param('query')) { print start_table({-cellpadding=>'10', -border=>'1'}), Tr([th({-align=>'left'}, ['Topic', 'Count', 'Top Result'])]); foreach my $topic (keys %topics) { my $results = $google_search -> doGoogleSearch( $google_key, param('query'), 0, 10, "false", $topic, "false", "", "latin1", "latin1" ); my $result_count = $results->{'estimatedTotalResultsCount'}; my $top_result = 'no results'; if ( $result_count ) { my $t = @{$results->{'resultElements'}}[0]; $top_result = b($t->{title}||'no title') . br( ) . a({href=>$t->{URL}}, $t->{URL}) . br( ) . i($t->{snippet}||'no snippet'), } # Output print Tr([ td([ $topics{$topic}, $result_count, $top_result ]) ]); } print end_table( ), } print end_html( );
The form code is built into the hack, so just call the hack with the
URL of the CGI script. For example, if I was running the program on
researchbuzz.com and it was called gootopics.pl
,
my URL might look like http://www.researchbuzz.com/cgi-bin/gootopic.cgi.
Provide a query and the script will search for your query in each
special topic area, providing you with an overall
(“All of Google”) count, topic area
count, and the top result for each. Figure 6-21
shows a sample run for "user interface"
with
Macintosh coming out on top.
Trying to figure out how many pages each topic finds for particular
top-level domains (e.g., .com
,
.edu
, .uk
) is rather
interesting. You can query for
inurl:
xx
site:
xx
, where
xx
is the top-level domain
you’re interested in. For example, inurl:va site:va
searches for any of the Vatican’s
pages in the various topics; there aren’t any.
inurl:mil site:mil
finds an overwhelming number of
results in the U.S. Government special topic—no surprise there.
If you are in the mood for a party game, try to find the weirdest possible searches that appear in all the special topics. “Papa Smurf” is as good a query as any other. In fact, at this writing, that search has more results in the U.S. Government specialty search than in the others.