Search for the tilde and other special characters in URLs.
Google can find lots of different things, but at this writing, it
can’t find special characters in its search results.
That’s a shame, because special characters can come
in handy. The tilde (~
), for example, denotes
personal web pages.
This hack takes a query from a form, pulls results from Google, and filters the results for the presence of several different special characters in the URL, including the tilde.
Why would you want to do this? By altering this hack slightly (see
Hacking the Hack) you could restrict
your searches to just pages with a tilde in the URL, an easy way to
find personal pages. Maybe you’re looking for
dynamically generated pages with a question mark
(?
) in the URL; you can’t find
these using Google by itself, but you can thanks to this hack. And of
course you can turn the hack inside out and not return results
containing ~, ?, or other special characters. In fact, this code is
more of a beginning than an end unto itself; you can tweak it in
several different ways to do several different things.
#!/usr/local/bin/perl # aunt_tilde.pl # Finding special characters in Google result URLs # Your Google API developer's key my $google_key='insert key here'; # Number of times to loop, retrieving 10 results at a time my $loops = 10; # Location of the GoogleSearch WSDL file my $google_wdsl = "./GoogleSearch.wsdl"; use strict; use CGI qw/:standard/; use SOAP::Lite; print header( ), start_html("Aunt Tilde"), h1("Aunt Tilde"), start_form(-method=>'GET'), 'Query: ', textfield(-name=>'query'), br( ), 'Characters to find: ', checkbox_group( -name=>'characters', -values=>[qw/ ~ @ ? ! /], -defaults=>[qw/ ~ /] ), br( ), submit(-name=>'submit', -value=>'Search'), end_form( ), p( ); if (param('query')) { # Create a regular expression to match preferred special characters my $special_regex = '[' . join('', param('characters')) . ']'; my $google_search = SOAP::Lite->service("file:$google_wdsl"); for (my $offset = 0; $offset <= $loops*10; $offset += 10) { my $results = $google_search -> doGoogleSearch( $google_key, param('query'), $offset, 10, "false", "", "false", "", "latin1", "latin1" ); last unless @{$results->{resultElements}}; foreach my $result (@{$results->{'resultElements'}}) { # Output only matched URLs, highlighting special characters in red my $url = $result->{URL}; $url =~ s!($special_regex)!<font color="red">$1</font>!g and print p( b(a({href=>$result->{URL}},$result->{title}||'no title')), br( ), $url, br( ), i($result->{snippet}||'no snippet') ); } } print end_html; }
There are two main ways you can change this hack.
You can easily alter the list of special characters you’re interested in by changing one line in the script:
-values=>[qw/ ~ @ ? ! /],
Simply add or remove special characters from the space-delimited list
between the /
(forward slash) characters. If, for
example, you want to add &
(ampersands) and
z
(why not?), while dropping ?
(question marks), that line of code should look be:
-values=>[qw/ ~ @ ! & z /],
(Don’t forget those spaces between characters in the list.)
You can just as easily decide to exclude URLs containing your special
characters as include them. Simply change the =~
(read: does match) in this line:
$url =~ s!($special_regex)!<font color="red">$1</font>!g and
to !~
(read: does not match),
leaving:
$url !~ s!($special_regex)!<font color="red">$1</font>!g and
Now, any result containing the specific characters will not show up.