Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 12

Advanced and Emerging Analytical Methodologies

So far we have covered only a few methodologies for analyzing social media and related data. Numerous other methodologies exist, many of which are complex and only now starting to be modified for and applied to social media data. This chapter briefly reviews a few advanced and emerging analytical methodologies you may come across and eventually need in your work. It first introduces and then describes the three methodologies of cluster analysis, geo-spatial network analysis, and agent modeling.

Expanding the Scope of Analysis

You can analyze social media and related data using a variety of analytical methodologies. Some are readily applicable to social media data and thus are used by analysts today—we covered some of them in Chapters 5 and 6. Many more methodologies exist and they differ in their applicability and relevance. Some of these methodologies tend to be harder to use because they are complex or require more advanced technical training. Some also require the type and size of data that is only now becoming available. Undoubtedly, as more people start to analyze social media data, the variety and number of relevant and applicable analytical methodologies will grow.

In this chapter, we briefly review three advanced and emerging analytical methodologies that are becoming increasingly applicable and relevant for analyzing or using social media data. Analysts have been using the three methodologies for quite some time in other fields and for other types of data. They differ in their complexity or difficulty, how they use social media data, and what types of questions they can help you answer. Cluster analysis, geo-spatial network analysis, and agent modeling were introduced briefly at the end of Chapter 3, and we explore them in this chapter in the order of how difficult they are to implement, starting with the least difficult. We measure difficulty in terms of the available software tools, resources, and technical skill it takes to use the methodology. Unlike with the methodologies in Chapters 5 and 6, we do not get into great detail about how you can use all the methodologies. However, we do point out resources and prepare you to at least understand the relevance of these methodologies to your mission sets. Based on our descriptions alone, you may be able to implement the first two methodologies. The last one, however, takes considerably more technical skill and familiarity with the science of data modeling. Expect to come across these methodologies more in the future. Understanding the content in this chapter will prepare you to learn and use these tools and review studies that use them.

Cluster Analysis

Cluster analysis is the process of assigning a population of items to a group or cluster according to their similarities, defined by a single or set of specific attributes. You can cluster or group virtually anything including types of data, hi5 accounts, people in a city, political parties, and analytical methodologies. Also, you can cluster them by almost any attribute. We clustered the analytical methodologies in this book according to the attributes of ease of implementation, relevance, and applicability. The methodologies in Chapters 5 and 6 are easier to implement, and more directly relevant and applicable to social media data. The ones in this chapter are somewhat less related on all three counts.

Essentially, cluster analysis helps you answer the question: “How can I separate a large number of people or things into different groups or clusters?” By using cluster analysis, you can focus your resources and analysis only on the groups of people or things that interest you the most. If you are faced with analyzing a large number of blogs and wish you could focus only on the blogs that, for example, mentioned certain topics, then cluster analysis can help you. Or if you are faced with marketing a crowdsourcing platform to a large number of people and wish you could focus only on the people who, for example, are young and Internet-savvy, then cluster analysis can help you.

The Process of Clustering

You can cluster things in a number of ways, and the best way depends on the question you want to solve. As long as you have at least two of something in a population, you can separate the population into clusters. You can separate a population of items into a number of different clusters. The clusters can also have subsidiary clusters, thereby producing a hierarchy of clusters. Some of the clusters can even overlap. Consider a simplistic example to make sure you understand the concept behind clustering and the process of clustering. Say you have a population of shapes, as depicted in Figure 12.1.

Figure 12.1 Population of shapes

Notice that some of the shapes have outlines made up of jagged dotted lines, whereas others have smooth straight lines. Some have a black outline and are filled with white, whereas others do not have a black outline and are filled with white. Some are circles and some are squares. You can cluster the shapes in a number of different ways. You can cluster them by the type of outline they have or by their shape type. Say you choose to cluster them by the type of outline. You can then further cluster each existing cluster by the color with which they are filled. You now have hierarchical clusters. You can then also cluster by shape type and create a cluster for shapes that are squares and a separate cluster for shapes that are circles. Notice that some of the shapes that are together in one cluster set (by shape type) may be separate in another cluster set (by outline type). In this case, the shapes belong to more than one cluster and are said to be overlapping. Figure 12.2 shows these different cluster sets.

Figure 12.2 Hierarchical cluster sets

We have barely touched the surface of how you can cluster a population of items and what attributes you can use to define the clusters. We will not even get into the process of soft or fuzzy clustering, which takes place when an item is said to only kind of belong in one cluster.

Also, note that there is no objectively correct way to cluster. Like network analysis, cluster analysis involves applying one or a combination of several algorithms to a data set. Each algorithm clusters a data set differently, even if they are clustering according to the same attribute. Network analysis only provides you with an estimation of the importance of the node to the network. Similarly, cluster analysis only provides you with an estimation of the relationship and closeness of a node to other nodes. Different tools use different algorithms that will give you different results, and no one tool or algorithm is necessarily more correct. Some tools, however, are more appropriate for certain situations or types of data. When you search for cluster analytics online, you are likely to come across clustering by a simple formula of mathematical distance. We describe this process in an example later.

Many cluster analytical tools exist, but only a few are relevant for social media data. Many existing tools focus on clustering people or other potential nodes in a network (such as individual blogs in a network of blogs that link to each other) according to their position and status in the network. In other words, the tools cluster nodes in a network according to attributes such as their centrality in the network or the number of links they have with other nodes. In fact, NodeXL features cluster analysis tools that cluster according to a node's relationship to the network to which the node belongs. NodeXL version 1.0.1.224 enables users to use three different clustering algorithms on their social media data. Expect for tools to emerge that cluster, for example, social media users by the type of words they tend to use, their behavior on a social media site, or their sentiment toward certain topics.

The Relevance of Clustering

Clustering has a wide variety of applications, and analysts have used it in other fields for years. Analysts have used it for grouping genes according to their expression patterns, customers according to various marketing segments, and crime hot spots according to their history of crime. It can also help you solve security problems related to social media in a number of ways. The most salient is that clustering can help you further understand social networks and relationships between people. You can use clustering to:

Separate Twitter users who make up a sprawling social network into small groups. You can then better focus your efforts and conduct social network analysis only on the most relevant groups instead of the entire network.
Identify the subgroups of a target audience to which you should market your crowdsourcing platform, and then direct your marketing campaign only at them.
Separate a large amount of user-generated social media content into groups by their time of creation so you can analyze only the most recent content.

At first, focus on clustering people rather than things or pieces of data because the act of clustering people is usually easier to grasp. Work on clustering people by several attributes including their age, behavior on social media, likes and dislikes, position in a social network, and even the news sites they usually tweet/retweet. If you are still unsure about how clustering works and how you can use it, consider the following example, which clusters by a formula of mathematical distance. Also consider the references for more resources.¹

Clustering Target Audience Example

Imagine you are deploying a solutions platform to crowdsource algorithms that process and decipher antagonistic behavior on video feeds, such as the platform we described in a walkthrough in Chapter 10. You need to market the platform to your target audience, who are primarily tech-savvy geeks and connected enough to spread the word to others. However, only a few members of the audience have the free time to work on your platform, and fewer have a large social network and can help you get the word out. Because your marketing resources are finite, you need to target only members of the target audience who are the most likely to use your platform and tell the most people about it on your behalf. You would like to cluster members of your target audience into a group that is likely to participate on your platform and has a big network (which we will call “Cluster Yes”) and a group that is not as likely to participate and has a small network (which we will call “Cluster No”). Your marketing resources may increase in the future so you would like to keep track of both groups, in case you have the resources to market to Cluster No later.

You can use a number of different methods to gauge a person's free time and social network. For the sake of simplicity, assume you decide to measure a person's free time by his or her age. You assume that people in their 20s and 30s have more time to work on your project than people aged 40 and up because older people are more likely to have families and more job responsibilities. You also assume that you can gauge a person's social network prowess by the amount of people following him or her on Twitter. You have the required information (ages and Twitter followers) for a thousand people who you believe belong in your target audience.

To begin clustering the thousand people into two groups using a simple mathematical distance formula, you need to first create two seeds or pick two seeds from the target audience, one for each group. A seed represents the person who typifies the group. So, the seed or person who ideally represents Cluster Yes is a person who is aged 30 and has 30 followers on Twitter. The person who ideally represents Cluster No is aged 50 and has 10 followers on Twitter. You now need to place the other audience members into the two clusters.

You start with a member of your target audience who is aged 33 and has 12 followers on Twitter (call this person “Person Three”). Person Three is in the right age group but has a tiny social network. You need to determine the cluster to which she belongs. For mathematical reasons we will not get into here, simply taking the difference between a person's age and number of followers with that of the seeds will not work.

Instead, first take the difference between the age of Person Three and the age of the seed from Cluster A. Then, take the differences between the number of followers of Person Three and the number of followers of the seed from Cluster A. Square the differences and take their sum (we will call the sum “Sum A”). Then, repeat the process but with the seed from Cluster B. Take the difference between the age of Person Three and the age of the seed from Cluster B, and the difference between the number of followers of Person Three and the number of followers of the seed from Cluster B. Square the differences and take their sum (we will call the sum “Sum B”). Table 12.1 shows the calculations.

Table 12.1 Clustering Calculations

Finally, compare Sum A and Sum B. The smaller the sum, the closer the person is mathematically to the seed. In this case, Person Three has a smaller Sum B (293) than Sum A (333), and so belongs in Cluster B. Person Three is in the right age group but her serious lack of Twitter followers indicates she is not an ideal person to whom you should market.

However, the cluster process is not yet finished, even for Person Three. Every time you place all people in a cluster, you need to take the average of every person's age and the number of followers in that cluster to come up with a new seed that accurately represents the average person in that cluster. You need to do this process for both clusters. You then need to kick all the people out of the clusters, and then recalculate which cluster they belong to and reassign them. You need to continue to do this process until you reach a steady state where no matter how many times you recalculate and reassign, people end up in the same cluster. Clearly, you need to use a computerized statistical tool such as NodeXL's clustering tools to cluster efficiently.

Geo-Spatial Network Analysis

Chapter 5 described how to conduct social network analysis. As you may recall, social network analysis is only one type of network analysis. You can perform network analysis on virtually any complex system where discrete entities are linked to each other. For example, you can conduct network analysis on a network of blogs that link to each other to find the most influential blog. You can even conduct network analysis on infrastructures such as oil pipelines to find parts of the pipelines that are the most essential to the functioning of the complete pipeline system. In such a system, the points of intersections of pipelines are the nodes and the pipelines are the links between the nodes. You can also use it on subway systems to find the stations that are the most important to the proper and timely function of the entire system. In such a system, the subway stations are the nodes and the lines connecting stations to each other are the links. Some research, some of which we conducted, suggests that some terrorists may even be picking which stations to attack based on the stations' network measures such as their betweenness centrality.² By identifying the stations with the most centrality or importance, the terrorists can maximize the damage they cause to the entire subway system and hence the feeling of terror in the victims' community. Law enforcement can use the same tools to identify the most critical subway stations and deploy resources accordingly. City planners can also use the tools to design subway systems where a few stations do not become so critical to the entire system that their elimination would totally disrupt the entire subway system.

Because of easy access to network analysis tools, analysts are expanding the systems to which they apply network analysis and are broadening the types of questions they are trying to answer with it. All sorts of insights are tumbling out and answering questions in unexpected ways. The proliferation and easy availability of social media data is adding to this golden age of network analysis. One interesting and relevant insight is that the likelihood a group will commit violence internally or against another group appears related to how it is linked with other groups by virtue of being next to them on a physical space.

The Process of Geo-Spatial Network Analysis

In a geo-spatial population system, each group or population of people is a node, and if they share a physical boundary with each other they are linked to each other. How you define what constitutes a certain group depends on various demographic and regional factors, and the level of analysis in which you are interested. Using this way of considering nodes and links, you can create a geo-spatial network map of any distributions of populations in a region. To aid understanding, consider Figure 12.3, which shows a geo-spatial network map of the central European countries. You will see that the Germany node is linked to the Czech Republic node because it shares a physical boundary with the Czech Republic. However, the Germany node does not share a link with the Hungary node because Germany does not share a physical boundary with Hungary.

Figure 12.3 Example of a geo-spatial network map of Central Europe

After mapping out the geo-spatial network map of the groups in a region, you can then conduct network analysis on the map. The network analysis will reveal that groups (nodes) with the highest measure of centrality will be more likely to engage in internal or civil war with members of their own group and less likely to engage in external conflict with members of other groups, regardless of other variables. A pilot study using this method found that an African country is more likely to engage in civil war and less likely to engage in external conflict if it has high measures of centrality.³ Although much more work and many more studies need to be done to verify the validity of this conclusion, the insight seems promising. If you need more details to understand this concept, check the book's website for a related study we are completing.

The Relevance of Geo-Spatial Network Analysis

Geo-spatial network analysis is relevant to social media because social media provides the venue through which you can collect data to forecast the type and likelihood of conflict that might take place in an area. You can crowdsource information about where certain groups and populations are located and whom they share boundaries with to do geo-spatial network analysis. Incorrect or incomplete data can seriously hamper your ability to do network analysis, and crowdsourcing enables you to collect timely and detailed data.

Crowdsourcing also enables you to collect data about nomadic groups and shifting boundaries with a level of accuracy and detail that traditional census data collection methods cannot match. You can then use the crowdsourced data to conduct geo-spatial network analysis and get an idea about where groups are located and how the location of the groups will impact the type of conflict in which they are likely to engage. Think of the analysis as another way to forecast the likelihood and type of conflict, and another way to understand why certain conflicts occur. Using network analysis in this way is especially relevant in cases where you do not have other information that can help you forecast and understand conflict.

Overall, it is not completely clear why a group that shares boundaries with lots of other groups is more likely to engage in internal conflict, while a group that shares few boundaries with other groups is more likely to engage in external conflict. However, remember that human and group behavior is often counterintuitive and heavily influenced by seemingly unimportant factors. Consider the following example for a more thorough understanding.

Geo-Spatial Network Analysis Example

Imagine you need to determine the likelihood that certain clans will engage in external or internal conflict in a certain region. You do not have much information about the clans and the reasons why they may engage in any conflict. You do have a general sense of the number of people in each clan and where they are distributed across the region. However, you do not have detailed information about exactly where the clans are at a specific moment in time and with whom they are sharing physical boundaries.

Say you create an SMS crowdsourcing capability, such as the one we described in the first walkthrough in Chapter 9. The capability simply sends SMS messages to people distributed across the region, asking about their clan affiliation and location. Over time, you build up a database of enough samples of people with information about what clan they belong to and their location. You can then create a topographical map of the clan distribution in the region. For illustrative purposes, say you create a map as shown in Figure 12.4. The map tells you the clans, where they are physically located, and with whom they share boundaries. (For simplicity's sake, we will label each clan with only a letter from the alphabet.)

Figure 12.4 Example map showing clan distribution

You can then use the population distribution map to create a geo-spatial network map for analysis. Treat each clan as a node and each boundary between clans as a link between them. Then use UCINET, NodeXL, or your network analysis software of choice to create a network map as shown in Figure 12.5.

Figure 12.5 Example geo-spatial network map of clans

Note

We posted the UCINET data file, geo_spatial.##h, we used to do the analysis in this example on the website.

You can now conduct network analysis on your geo-spatial network data set. Different algorithms produce different results, and we are not sure yet exactly which algorithm will give you the best result. However, past research and experience suggests that UCINET's algorithm for betweenness centrality may provide the best results. You simply need to run the betweenness centrality algorithm on the data set. We posted our results in Table 12.2, in descending order of betweenness centrality ranking. The clans at the top have the highest centrality rankings, and the ones at the bottom have the lowest.

Table 12.2 Centrality Rankings of Clans

Ranking	Clan
1	M
2	K
3	G
4	J
5	L
6	B
7	H
8	D
9	F
10	C
11	A
12	E
13	I

The results indicate Clans M and K are more likely to engage in internal conflict, whereas Clans E and I are more likely to engage in external conflict with other clans.

Geo-spatial network analysis and social network analysis on their own are powerful tools. The real fun starts when you combine the different networks and layer them on top of each other. In other words, you create a social network map of all the people you are interested in, and then layer them on top of a geo-spatial network map of where they are located. Although we have not done the research, we expect that such layering and cross-analysis between the two maps might produce some interesting insights about how social networks and physical locations influence each other and the tendency to engage in conflict.

Also, note that clustering can also help you determine how you define what constitutes a group. You then reach a point where you are combining different types of analytical methodologies and crowdsourced data to produce a powerful and elegant way to forecast and understand conflict.

Agent Modeling

The universe of social media comprising the people who use it, how they use it, and what they create on it, is a highly complex, inter-related, and dynamic system. Countless factors and variables go into influencing people's behavior, which in turn influences the type of data they create on social media, which influences their behavior. Trying to understand people's behavior on and because of social media is difficult to do. Trying to forecast how people might behave in the near future because of the influence of social media in certain scenarios is even harder. Fortunately, emerging and existing computational modeling tools can help you model, forecast, and make sense of people's behavior vis-à-vis social media.

Agent modeling or agent-based modeling is a type of computational modeling where you can simulate the mental and physical behaviors of any discrete, autonomous entity or agent to better understand and forecast the entity's behavior. Simply put, agent modeling enables you to virtually model any (single or numerous) person, group, or thing based on a variety of data sources including social media. Once you can virtually model a person, group, or thing, you can program the virtual model to behave in specific ways, and watch as it interacts with other entities and reacts to virtual situations and environments. You can then identify how the behavior of the entities changes over time. Agent modeling is used in a variety of fields such as economics, biology, and ecology to model and forecast everything from the interactions between wolves and sheep to the shifting biases of intelligence analysts to the behavior of companies on a stock market. Numerous types of agent modeling tools exist and they differ greatly in how they model entities, how many and what type of entities they model, how they let the entities behave, what data they use to model the entities, what types of environments they place the entities in, and what they output. Regardless, the entities you model must be discrete and decentralized actors.⁴

Agent modeling may seem difficult to understand at first, but it is actually fairly intuitive and elegant. The operational details are difficult to fully grasp, but the overall concept is not. We encourage you to read the subsequent sections with an open mind so you can fully grasp the concept of agent modeling. We are big fans of certain types of agent modeling and use it in our work to solve seemingly intractable problems. We hope you will also.

The Relevance of Agent Modeling

To aid understanding, we discuss the relevance of agent modeling to you and social media before we delve into how it works. Understanding the relevance of agent modeling will spur you to think about how you can use it as you learn how it works. You may want to read this section again when you get to the end to fully grasp the concepts discussed.

Agent modeling can you help you model and forecast, for example:

How a specific social network on a social media platform will grow and evolve into the future
The behavior of rioters (groups and individuals), their physical location and likelihood to engage in violence, and how information on social media is influencing them
How participants on your crowdsourcing platform will respond to specific influence injects

Numerous other applications exist. Some, as the aforementioned, are directly relevant to the interaction between social media and people who use it. Other security applications are more indirectly relevant and may use social media simply as one of many data sources or even not at all. In a lot of cases, the more data you use to inform and program the agents, the more accurate and precise the model will be. Also, many existing agent modeling applications simply build models of people or events and then do not update them with real-world information. You can now process real-time social media feeds and constantly input them into your agent application to build a more realistic and accurate model.⁵ Other security applications that may not involve social media as heavily may include modeling and forecasting:

The placement of IEDs along a certain road
The logistics of transferring a unit and its equipment to a certain location
The behavior of people evacuating a building

More advanced applications of agent modeling, which we work on, involve using it to tease out patterns in all sorts of data including social media. In such cases, agent modeling tools can identify how the emergence of a set of data points tells you that a specific behavior or event is about to take place or has taken place.⁶ For example, say that the tool downloads public information from sites known to be popular among violent extremist groups and information about the purchase of sensitive and hazardous materials in certain cities. The tool can then identify that a person used the same credit card to purchase certain types of dangerous chemicals from different stores. The tool can also identify that a person visited an online forum where he discussed purchasing the chemicals to make a bomb, and that on another site, a person with a similar username as the person on the other forum made threats to members of a religious group at around the same time as the other actions. The tool can then autonomously connect the dots and send a warning to law enforcement and the religious group that a person in their locality may be making such a bomb to attack them. Such an overwhelming tool is not yet fully operational, but we hope to make it so very soon.

Note

Visit the website to learn more about this advanced type of data pattern analytics and other interesting agent modeling applications.

The Process of Agent Modeling

Agent modeling may sound too good to be true, and in some cases it is. You need to understand which type of agent modeling to use and when to use it. We primarily discuss a subset of agent modeling known as multi-agent modeling or multi-agent systems. (Different types of multi-agent systems exist, but we will not get into them here.) In a multi-agent system, numerous agents, following a few simple rules of behavior, interact with each other and their environment. The product of their repeated interactions is a change in the makeup or behavior of agents and an overall solution to a problem. For example, say you are mapping the social network of human traffickers across Europe by monitoring their cell phone usage. You may like to know how the social network of traffickers will evolve over time and how it will change in response to police action against certain parts of the network. You can model the traffickers as agents and watch how their interactions and social network evolve over time and how it changes in response to police action. You can then get an idea of, for example, what the network will look like three months from now (which agents form links and relationships with other agents), or how the network will adapt if you cripple a part of it.

Right about now, you probably would like to ask us the following two sets of questions:

How do you train the agents to behave as traffickers and how do you make sure they are behaving realistically? If you program agents to behave like traffickers, then you are assuming you know about the behavior of traffickers. Then is the model not based on a bunch of assumptions that could be very wrong?
Human behavior is extremely complex and if you tried to program an agent to behave like a human, you would have to program hundreds and thousands of rules of behavior and variables. To get around this problem, you give the agents only a few rules of behavior. However, how can a few agents following a few simple rules tell us something about complex behavior?

Understanding the questions and their answers is critical to understanding agent modeling. The answers lie in the concepts of swarm intelligence and genetic algorithms.

Swarm Intelligence

The type of agent modeling we prefer is a data-driven, bottom-up method that has its roots in induction. In other words, the agents produce results without you telling the agents what the results should be. This bottom-up approach is similar to the concept of swarm intelligence, which is the collective behavior of a system of numerous discrete, decentralized actors. Swarm intelligence explains how you can train agents with only a few simple rules but then watch them interact and produce complex, system-wide behavior, also known as emergent intelligence.⁷

Swarm intelligence derives its name from the fact that swarms of birds and ants are able to engage in amazing complex behavior even when each bird or ant is relatively dumb. Understanding how swarm intelligence works in nature will help you understand how it works in agent modeling. Consider how ants successfully forage for food at a large scale, given the fact that each ant by itself has little intelligence. Say you have an ant colony and a few sites containing food surround it. Some of the sites have a lot of food and some have little. Figure 12.6 shows a schematic of the ant colony and the distribution of food sites. The ants have to figure out a way to find the food, tell other ants about it, and carry the food back to the colony.

Figure 12.6 Ant colony and food site distribution

Each ant follows only a few simple rules, much like how agents in most agent modeling circumstances follow only a few simple rules. The rules are:

Move stochastically or randomly around the ant colony looking for food.
Upon finding food, carry it back to the ant colony.
Drop chemical pheromones, which evaporate over time, while walking.
Move toward the direction with pheromones, or in other words, be attracted to pheromones.

With these rules in mind, consider what happens over time as the ants move out from the colony. First, the ants wander around looking for food. Eventually, a few ants stumble upon the food, pick it up, and carry it home. These ants now know the location of the food. As they pick up the food and carry it home, they drop pheromones on the trail from the colony to the food. Other ants stumble onto the pheromones and start following the trail of the ants that discovered the food. They then stumble onto the food, and also start dropping pheromones on the trail from the food source to the colony. Eventually, the amount of pheromones on the trail from the food source to the colony grows and more ants become attracted to it. Eventually though, the food runs out. However, another trail to another food source starts growing in pheromone intensity. Ants then become interested in that trail. Meanwhile, the pheromones of ants that moved around randomly without finding food quickly evaporate because they are not walking the same paths due to the lack of food. Over time, the ants stumble onto all the food sources and successfully forage for food. Figure 12.7 illustrates this swarm behavior.

Figure 12.7 Swarm behavior of ants foraging for food

A few dumb ants knowing only a few rules interacting with each other can exhibit impressive emergent intelligence. Similarly, a few agents knowing only a few rules interacting with each other can produce complex system-wide behavior that impacts them and their environment. In the case of the trafficker network example, you would simply program a few rules into the traffickers and let them interact and form links with each other, somewhat randomly. The somewhat random part takes into account the fact that human behavior can often seem random and unexplained. The interactions will affect the traffickers and the traffickers will affect the interactions in untold ways, producing behavior you might not expect. You will need to use some past data to provide the trafficker agents with the few rules of how to interact and make links with other trafficker agents. To make sure that you are providing them with the right few rules and that the trafficker agents are behaving realistically, you may need to introduce a genetic algorithmic component into your agent model.

Genetic Algorithms

Genetic algorithms are processes and rules based on the phenomena of evolution and natural selection that enable computers to evolve agents that resemble the entities you are trying to model.⁸ Think of the computer in which you are creating your agent model as a planet. Virtual agents live in the computer planet in their virtual environments. You would like to fashion agents to resemble the entities you are trying to model. Initially, the agents do not have much of a personality and do not know how to behave with each other and their environment. Following our example from before, you need the agents to behave like traffickers and form a social network with other trafficker agents. You provide them with a few rules of behavior and modify their environment to fit that of the traffickers. The environment could be anything from the virtual representation of the physical infrastructure of a city, to the virtual representation of a social environment with virtual obstacles that make it difficult for some agents to interact with other agents.

So far you have agents with some intelligence and a tendency to act somewhat randomly and explore on their own, and an environment. However, you cannot be sure that the agents really behave like traffickers. You may have programmed them incorrectly and likely neglected to take into account some behaviors that significantly define the behavior of traffickers. Ideally, you would like the agents to figure out on their own exactly how to behave so that your prejudices, biases, and lack of knowledge do not sway them and ruin your agent model. Fortunately, your agents can somewhat figure out how to act on their own through the process of genetic algorithmic computing.

Before we continue, we need to refresh your biology knowledge and your understanding of natural selection. During reproduction, the genetic code of organisms often mutates and produces a few organisms with different characteristics and abilities. Some of the mutations improve the organism's ability to succeed in life (reproduce), some hamper it, and some have no effect. Over time, organisms compete with one another for resources and mating partners. Organisms with beneficial mutations are more likely to win the competition and produce more offspring, which propagates the mutation through the ages. Other organisms tend to lose the competition. In essence, nature selects against the losing organisms. Instead, nature selects for organisms that have evolved the correct characteristics and behaviors for them to succeed, which is to reproduce at that time in their environment.

Genetic algorithmic computing follows this same process. You provide the agents with a few behaviors initially, but then you let the agents experiment, mutate, and evolve. Some agents begin exhibiting behavior patterns that help them succeed in their environment, and others evolve behaviors that hurt them. Instead of defining success as the ability of agents to reproduce, you can define success as how close the agents come to resemble the traffickers (how close the agents' networks look like the traffickers' networks) at some point in time. Specifically, you map out what the traffickers' social network looks like at different periods of time. Say that you have information about the traffickers' social network going back ten years from 2003 to 2013. You do not have perfect information about exactly what the network looked like at any point of time during that decade, but you do have a significant amount of information about what the network looked like at points throughout the decade. You then start your computer planet and agents at virtual year 2003. You program the agents to behave as you think traffickers probably behave and form the social network the real traffickers had in 2003. The key is that you create multiple iterations of the computer planet with multiple populations of the trafficker agents. You then let the agents in the planets evolve. After some time, the computer planets reach the virtual year of 2006. At that point, in some planets the agents' networks will look like what the real traffickers' network looked like in the real 2006. In other words, some agents will succeed and accurately resemble the real traffickers. You select the agents that do resemble the real traffickers and destroy the other planets and agents.

You then again let the agents evolve some more. Again you stop the clock at the virtual year 2009 and see which agents' networks resemble that of the real traffickers' network in the real 2009. You again select for the agents that best resemble the real traffickers, and repeat the process. Eventually, you will end up with agents that evolved in such a way that they end up behaving and forming networks like the real traffickers in the real year 2013. You can then let the agents evolve into the future to see how the traffickers might act in the future. Or, you can examine how the agents behaved and how their behaviors look different than when you initially programmed them to try to figure out how the real traffickers actually behave. Figure 12.8 illustrates this process.

>Figure 12.8 Genetic algorithm process

You may need to reread the preceding sections a few times to fully understand the power of agent modeling and how it can help you. Admittedly, it took us some time to understand it fully but we are glad we did. Check out the references for more information, some of which is technical but some of which explains the subjects we touched upon in layman terms. In the appendix, we also list some websites and software solutions that can help you build simple and complex agent modeling capabilities. Some such as George Mason University's MASON application (available at http://cs.gmu.edu/∼eclab/projects/mason/) or Northwestern University's NETLOGO application (http://ccl.northwestern.edu/netlogo/) are relatively easy to use and require little technical knowledge. Others are significantly harder and require specialized technical expertise.

Agent modeling, geo-spatial network analysis, and cluster analysis are impressive and powerful tools that can help your mission sets in exciting ways. However, always keep in mind that no analytical methodology is perfect. They all have their limitations and none can yet accurately and precisely model the world and all the intricate, complex beings and things that exist in it. Keep an open and skeptical mind as you research and use the methodologies we covered in this book. You will also need to keep an open and skeptical mind for Chapter 13, where we explore how you can use crowdsourcing to solve problems of health and education that are increasingly tied up with issues of security.

Summary

Many analytical methodologies, some of which are quite complex, use and analyze social media data. Over time, they will grow in their number, applicability, power, and relevance.
Cluster analysis is a method of assigning discrete entities into clusters or groups according to certain criteria.
Cluster analysis can help you categorize people, data, and things so you can better focus your efforts and understand your populations of interest.
Geo-spatial network analysis enables you to analyze the distribution of groups of people over a region to derive the likelihood a specific group will engage in internal or external conflict.
Crowdsourcing enables you to collect detailed data about groups of people, especially nomadic people, thereby improving your geo-spatial network analysis.
Agent modeling is the process of modeling the behavior of any discrete, decentralized entities to understand and forecast their behavior.
You can use agent modeling to model the past, present, and future behavior of entities ranging from rioters to the movement of warfighters to social networks on social media.
Many sophisticated agent modeling systems are based on the concepts of swarm intelligence and genetic algorithmic computing, which explain how virtual agents can evolve over time to produce and explain complex behavior.
No analytical methodology, regardless of its complexity and power, can perfectly explain and forecast the world and human behavior.

Notes

1. Everitt, B., et al. (2011) Cluster Analysis. Wiley, United Kingdom; Romesburg, C. (2004) Cluster Analysis for Researchers. Lulu Press, North Carolina.

2. Jordán, F. (2008) “Predicting Target Selection by Terrorists: A Network Analysis of the 2005 London Underground Attacks.” International Journal of Critical Infrastructure 4: 206-214; Gupta R. (2012) Utilizing Network Analysis to Identify Critical Vulnerability Points in Infrastructure and Explain Terrorist Target Selection. Georgetown University Security Studies Program, UMI Dissertation Publishing.

3. Johnson, D. and Jordán, F. (2007) “The Web of War: A Network Analysis of the Spread of Civil Wars in Africa” Annual Meeting of the American Political Science Association, Chicago.

4. Parunak, H.V.D., Savit, R., and Riolo, R. L. (1998) “Agent-Based Modeling vs. Equation-Based Modeling: A Case Study and Users' Guide.” Multi-Agent Systems and Agent-Based Simulation, 1534: 277-283; Epstein, J (2006) Generative Social Science: Studies in Agent-Based Computational Modeling. Princeton University Press, Princeton, NJ; Railsback, S. (2011) Agent-Based and Individual-Based Modeling: A Practical Introduction. Princeton University Press, Princeton, NJ.

5. Parunak, H.V.D., Brueckner, S., Gupta, R., and Brooks, H. (2012) “Dynamically Tracking the Real World in a CSS Model.” Annual Computational Social Science Society of the Americas Conference, Santa Fe, NM.

6. Parunak, H.V.D., et al. (2009) “Stigmergic Modeling of Hierarchical Task Networks.” Proceedings of the Tenth International Workshop on Multi-Agent Based Simulation, 98–109.

7. Parunak, H.V.D. (1997) “'Go to the Ant': Engineering Principles from Natural Agent Systems.” Annals of Operations Research, 75: 69–101.

8. Mitchell, M. (2011) Complexity: A Guided Tour. Oxford University Press, Oxford, UK.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 12: Advanced and Emerging Analytical Methodologies

Create new playlist

Sign In

Sign Up