Geovisualization

One of the other common application of social media is geo-based visualization. In the tweet structure, we saw attributes named geo, longitude, and latitude. Once you have access to these values, it is very easy to use some of the common visualization libraries, such as D3, to come up with something like this:

Geovisualization

This is just an example of what we can achieve with these kind of visualizations; this was the visualization of a tweet in the U.S. We can clearly see the areas of increased intensity in eastern places such as New York. Now, a similar analysis done by a company on the customers can give a clear insight about which are some of the most popular places liked by our customer base. We can text mine these tweets for sentiment, and then we can infer insights about customers as to in which states they are not happy with the company and so on.

Influencers detection

Detection of important nodes in the social graph that has a lot of importance is another great problem in the context of social graphs. So, if we have millions of tweets streaming about our company, then one important use case would be to gather the most influential customers in the social media, and then target them for branding, marketing, or improving customer engagement.

In the case of Twitter, this goes back to the graph theory and concept of PageRank, where in a given node, if the ratio of outdegree and indegree is high, then that node is an influencer. This is very intuitive since people who have more followers than the number of people they follow are typically, influencers. One company, KLOUT, (https://klout.com/) has been focusing on a similar problem. Let's write a very basic and intuitive algorithm to calculate Klout's score:

>>>klout_scores = [ (tweet['user']['followers_count]/ tweet['user']['friends_count'],tweet['user']) for tweet in tweets ]

Some of the examples where we worked on Twitter will hold exactly the same modification of content field. We can build a trending topic example with Facebook posts. We can also visualize Facebook users and post on the geomap and influencer kind of use cases. In fact, in the next section, we will see a variation of this in the context of Facebook.

Facebook

Facebook is a bit more personal, and somewhat private social network. Facebook does not allow you to gather the feeds/posts of the user simply for security and privacy reasons. So, Facebook's graph API has a limited way of accessing the feeds of the given page. I would recommend you to go to https://developers.facebook.com/docs/graph-api/using-graph-api/v2.3 for better understanding.

The next question is how to access the Graph API using Python and how to get started with it. There are many wrappers written over Facebook's API, and we will use one the most common Facebook SDK:

$ pip install facebook-sdk

Tip

You can also install it through:

https://github.com/Pythonforfacebook/facebook-sdk.

The next step is to get the access token for the application while Facebook treats every API call as an application. Even for this data collection step, we will pretend to be an application.

Note

To get your token, go to:

https://developers.facebook.com/tools/explorer.

We are all set now! Let's start with one of the most widely used Facebook graph APIs. In this API, Facebook provides a graph-based search for pages, users, events, places, and so on. So, the process of getting to the post becomes a two-stage process, where we have to look for a specific pageid / userid related to our topic of interest, and then we will be able to access the feeds of that page. One simple use case for this kind of an exercise could be to use the official page of a company and look for customer complaints. The way to go about this is:

>>>import facebook
>>>import json

>>>fo = open("fdump.txt",'w')
>>>ACCESS_TOKEN = 'XXXXXXXXXXX' # https://developers.facebook.com/tools/explorer
>>>fb = facebook.GraphAPI(ACCESS_TOKEN)
>>>company_page = "326249424068240"
>>>content = fb.get_object(company_page)
>>>fo.write(json.dumps(content))

The code will attach the token to the Facebook Graph API and then we will make a REST call to Facebook. The problem with this is that we have to have the ID of the given page with us beforehand. The code which will attach the token is as follows:

"website":"www.dimennachildrenshistorymuseum.org",
"can_post":true,
"category_list":[
{
"id":"244600818962350",
"name":"History Museum"
},
{
"id":"187751327923426",
"name":"Educational Organization"
}
],
"likes":1793,
},
"id":"326249424068240",
"category":"Museum/art gallery",
"has_added_app":false,
"talking_about_count":8,
"location":{
"city":"New York",
"zip":"10024",
"country":"United States",
"longitude":-73.974413,
"state":"NY",
"street":"170 Central Park W",
"latitude":40.779236
},
"is_community_page":false,
"username":"nyhistorykids",
"description":"The first-ever museum bringing American history to life through the eyes of children, where kids plus history equals serious fun! Kids of all ages can practice their History Detective skills at the DiMenna Children's History Museum and:

u2022 discover the past through six historic figure pavilions

u2022!",
"hours":{
""thu_1_close":"18:00"
},
"phone":"(212) 873-3400",
"link":"https://www.facebook.com/nyhistorykids",
"price_range":"$ (0-10)",
"checkins":1011,
"about":"The DiMenna Children' History Museum is the first-ever museum bringing American history to life through the eyes of children. Visit it inside the New-York Historical Society!",
"name":"New-York Historical Society DiMenna Children's History Museum",
"cover":{
"source":"https://scontent.xx.fbcdn.net/hphotos-xpf1/t31.0-8/s720x720/1049166_672951706064675_339973295_o.jpg",
"cover_id":"672951706064675",
"offset_x":0,
"offset_y":54,
"id":"672951706064675"
},
"were_here_count":1011,
"is_published":true
},

Here, we showed a similar schema for the Facebook data as we did for Twitter, and now we can see what kind of information is required for our use case. In most of the cases, the user post, category, name, about, and likes are some of the important fields. In this example, we are showing a page of a museum, but in a more business-driven use case, a company page has a long list of posts and other useful information that can give some great insights about it.

Let's say I have a Facebook page for my organization xyz.org and I want to know about the users who complained about me on the page; this is good for a use case such as complaint classification. The way to achieve the application now is simple enough. You need to look for a set of keywords in fdump.txt, and it can be as complex as scoring using a text classification algorithm we learned in Chapter 6, Text Classification.

The other use case could be to look for a topic of interest, and then to look for the resulting pages for open posts and comments. This is exactly analogous to searching using the graph search bar on your Facebook home page. However, the power of doing this programmatically is that we can conduct these searches and then each page can be recursively parsed for use comments. The code for searching user data is as follows:

User search

>>>fb.request("search", {'q' : 'nitin', 'type' : 'user'})
Place based on the nearest location.
>>>fb.request("search", {'q' : 'starbucks', 'type' : 'place'})
Look for open pages.
>>>fb.request("search", {'q' : 'Stanford university', 'type' : page})
Look for event matching to the key word.
>>>fb.request("search", {'q' : 'beach party', 'type' : 'event'})

Once we have dumped all the relevant data into a structured format, we can apply some of the concepts we learned when we went through the topics of NLP and machine learning. Let's pick the same use case of finding posts, that will mostly be complaints on a Facebook page.

I assume that we now have the data in the following format:

Userid

FB Post

XXXX0001

The product was pathetic and I tried reaching out to your customer care, but nobody responded

XXXX002

Great work guys

XXXX003

Where can I call to get my account activated ??? Really bad service

We will go back to the same example we had in Chapter 6, Text Classification, where we built a text classifier to detect whether the SMS (text message) was spam. Similarly, we can create training data using this kind of data, where from the given set of posts, we will ask manual taggers to tag the comments that are complaints and the ones that are not. Once we have significant training data, we can build the same text classifier:

fb_classification.py
>>>from sklearn.feature_extraction.text import TfidfVectorizer
>>>vectorizer = TfidfVectorizer(min_df=2, ngram_range=(1, 2),  stop_words='english',  strip_accents='unicode',  norm='l2')
>>>X_train = vectorizer.fit_transform(x_train)
>>>X_test = vectorizer.transform(x_test)

>>>from sklearn.linear_model import SGDClassifier
>>>clf = SGDClassifier(alpha=.0001, n_iter=50).fit(X_train, y_train)
>>>y_pred = clf.predict(X_test)

Let's assume that these three are the only samples. We can tag 1st and 3rd to be classified as complaints, while 2nd will not be a complaint. Although we will build a vectorizer of unigram and bigram in the same way, we can actually build a classifier using the same process. I ignored some of the preprocessing steps here. You can use the same process as discussed in Chapter 6, Text Classification. In some of the cases, it will be hard/expensive to get training data like this. In some of these cases, we can apply either an unsupervised algorithm, such as text clustering or topic modeling. The other way is to use some different dataset that is openly available and build model on that and apply it here. For example, in the same use case, we can crawl some of the customer complaints available on the Web and use that as training data for our model. This can work as a good proxy for labeled data.

Influencer friends

One other use case of social media could be finding out the most influencer in your social graph. In our case, it could be finding out a clear node that has a vast amount of inlinks and outlinks will be the influencer in the graph.

The same problem in the context of business can be finding out the most influential customers, and targeting them to market our products.

The code for the Influencer friends is as follows:

>>>friends = fb.get_connections("me", "friends")["data"]
>>>print friends
>>>for frd in friends:
>>>    print fb.get_connections(frd["id"],"friends")

Once you have a list of all your friends and mutual friends, you can create a data structure like this:

source node

destination node

link_exist

Friend 1

Friend 2

1

Friend 1

Friend 3

1

Friend 2

Friend 3

0

Friend 1

Friend 4

1

This a kind of data structure that can be used to generate a network, and is a very good way of visualizing the social graph. I used D3 here, but python also has a library called NetworkX (https://networkx.github.io/) that can be used to generate graph visualization, as shown in the following graph. To generate a visualization, you need to arrive at a adjacency matrix that can be created based on the bases of the preceding information about who is the friend of whom.

Influencer friends

Visualization of a sample network in D3

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset