There is another recipe in this module that illustrates how to extract e-mails from a website. This recipe will show you how to create a local Maltego transform, which you can then use within Maltego itself to generate information. It can be used in conjunction with URL spidering transforms to pull e-mails from entire websites.
The following code shows how to extract e-mails from a website through the use of regular expressions:
import urllib2 import re import sys tarurl = sys.argv[1] url = urllib2.urlopen(tarurl).read() regex = re.compile((“([a-z0-9!#$%&’*+/=?^_`{|}~- ]+(?:.[*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&’*+/=?^_`” “{|}~- ]+)*(@|sats)(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?(.|” “ sdots))+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)”)) print”<MaltegoMessage>” print”<MaltegoTransformResponseMessage>” print” <Entities>” emails = re.findall(regex, url) for email in emails: print” <Entity Type=”maltego.EmailAddress”>” print” <Value>”+str(email[0])+”</Value>” print” </Entity>” print” </Entities>” print”</MaltegoTransformResponseMessage>” print”</MaltegoMessage>”
The top of the script imports the necessary modules. After this, we then assign the URL supplied as an argument to a variable and open the url
list using urllib2
:
tarurl = sys.argv[1] url = urllib2.urlopen(tarurl).read()
We then create a regular expression that matches the format of a standard e-mail address:
regex = re.compile((“([a-z0-9!#$%&’*+/=?^_`{|}~-]+(?:.[a-z0- 9!#$%&’*+/=?^_`” “{|}~-]+)*(@|sats)(?:[a-z0-9](?:[a-z0-9- ]*[a-z0-9])?(.|” “sdots))+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)”))
The preceding regular expression should match e-mail addresses in the format [email protected]
or e-mail at address dot com.
We then output the tags required for a valid Maltego transform output:
print”<MaltegoMessage>” print”<MaltegoTransformResponseMessage>” print” <Entities>”
Then, we find all instances of text that match our regular expression inside the url
content:
emails = re.findall(regex, url)
We then take each e-mail address we have found and output it in the correct format for a Maltego transform response:
for email in emails: print” <Entity Type=”maltego.EmailAddress”>” print” <Value>”+str(email[0])+”</Value>” print” </Entity>”
We then close the open tags that we opened earlier:
print” </Entities>” print”</MaltegoTransformResponseMessage>” print”</MaltegoMessage>”