Chapter 3
Introduction to Social Media Analytics
Analyzing social media data improves operational planning and execution. It can help you understand social networks and what they are discussing, identify key people and relationships, and understand and forecast events. However, analysis offers benefits only if it is conducted accurately and honestly. This chapter lays the foundation for learning social media analytics by defining what analysis is, what it is not, and how it can help, providing an analysis overview, and introducing the analytical methodologies this book covers.
Conducting any type of analysis requires knowing what it is and how it differs from other ways of making sense of the world. However, it has limitations and can easily fall prey to corrupt practices. Only by understanding the powers and limitations of analysis can you start to learn how to analyze social media to solve various problem sets. The subsequent sections may seem academic and dry at times, but unless you are comfortable with doing quantitative analysis, you should go through them. It will help you analyze social media and other types of data, and appreciate and critique the analyses of others.
Analysis is the systematic study of relevant data to gain insights about a topic. Analysis involves carrying out a variety of objective methodologies on evidence to find answers to specific problems. The methodologies are a series of steps, derived from past analytical studies and theories that if applied correctly to data will likely lead to accurate answers. This part of the book is dedicated to learning about the different methodologies and the proper ways of applying them.
Analytical methodologies can be applied to virtually any problem set, but we are focused on solving a few security-related sets. Table 3.1 shows the relevant problem sets, their descriptions, and specific examples. Occasionally we also touch on how social media analysis can help solve other problem sets.
Problem Set | Description | Specific Problem Examples |
Understand the structure of social networks | Track the development of relationships between people online and offline, and understand how people use social media to maintain online and offline relationships. | To what extent and why are human traffickers using social media to communicate with each other? |
Identify key people and relationships | Determine who in social networks wields influence over people in the networks, and has the ability to affect their behavior and relationships. | Which online violent extremist recruiters are the most effective at recruiting at-risk youth? |
Determine the proliferation of ideas in networks | Understand which topics and ideas individuals and groups are discussing and sharing. | What type of violent extremist literature, rhetoric, and ideals disseminate through online social media? |
Understand and forecast behavior | Understand the relationship between behavior, environmental constraints, and discussions and networks on social media. Also, use the understanding and real-time data to determine how likely individuals or groups are to undertake a specific behavior in the future. | How are gangs using social media to inflame tensions with rival gangs, and is their use changing? |
Understand and forecast events | Understand the relationship between events, environmental constraints, and discussions and networks on social media. Also, use the understanding and real-time data to determine the likelihood of specific events occurring in the future. | What is the likelihood that there will be a famine? |
The outputs of the analyses are concrete answers to sample problems. They are the Twitter handles of the most influential people in networks, the probability that violence will break out in a certain area, the name of the social media platform that played the most pivotal role in helping rioters organize, and much more.
As you read this chapter, keep in mind that analysis has limits. If used correctly, it offers tremendous insight into the most complicated subjects. However, analysis rarely results in certain predictions, precise rules describing human behavior regardless of environment, or perfect understanding. Analytical methodologies are tools that help you discover, describe, and forecast human behavior in the context of security. Due to the complexity of human behavior and the incomplete nature of the data in question, the relevant analytical methodologies are somewhat flawed and imprecise. The best way to push against the limits of analysis is to adopt analytical tools from other unrelated fields and to be intellectually honest at all times.
Analysis comes naturally to humans as the basis of critical thought, but it is often riddled with cognitive fallacies and pitfalls that lead to incomplete or incorrect answers. Also, a meaningful portion of the security and defense world is staffed with people with no formal training in applying objective analytics to complex problems. What often passes for “analysis” is the subjective opinion of a biased individual with little real-world experience and/or no grounding in the scientific method. The process for such “analysis” is usually simplistic, and consists of the following steps:
This faulty analytical process has resulted in misleading predictions about major foreign policy events and wastes of taxpayer money. The faulty analytical process is especially harmful when it comes to the business of forecasting and discovering the causal effects that produce security events. The United States government and other organizations often fund “analysts” who claim to forecast events but have no idea how to conduct analysis and routinely engage in the analysis don'ts, as described later. If you are native to the foreign policy or defense world, learn to disregard the “analysis” of pundits and so-called experts. Ignore those who proudly call themselves experts or claim they can “predict” something with 100 percent certainty. “Experts” are more likely than randomly generated results to be incorrect about major security and foreign policy issues. Chimpanzees randomly throwing darts at a board plastered with predictions are almost as likely to choose the right answer as “experts.”1 Unless you have a strong interest or background in quantitative methods, statistics, or science, forget what you know about “analysis.”
The overall analytical processes and methodologies described herein are similar to what is taught in sophisticated political science and social science courses. One process uses a set of methodologies to create theories that aid understanding of security events and relevant behavior, and the second process uses another set of methodologies to apply those and other theories to other cases. Doing so will help us answer the types of problem sets described in Table 3.1. Specifically, the theories we develop and use will help describe rules of individual and group behavior in relevant contexts, determine the probability that a future event or action will take place, and identify relationships and causal effects between people, objects, and environments.
Unlike the “hard sciences” such as physics and chemistry, few hard-and-fast universal laws or strong theories exist in political and social science. We are primarily studying humans, not large static objects (although a lot of humans are increasingly behaving as large, static objects). The complexity of our subject makes determining laws for individuals and groups very difficult. Compared to the axiomatic laws of physics that describe the behavior of objects, determining the theories that describe humans and their “rational behavior” is incredibly complex and requires accounting for a dizzying array of factors.
Therefore, focus more on the data and allow the data to tell you about how humans are behaving in specific contexts. Then use what the data has said to help understand how humans are behaving in other similar contexts. This process or type of reasoning is known as inductive reasoning. In some cases, use existing proven and tested theories about how humans or other similar beings and objects behave individually and in groups. This process or type of reasoning is known as deductive reasoning. Of the analytical methodologies we describe, some are primarily inductive and others are primarily deductive, but all involve both types of reasoning. In reality, the two processes are often mixed and intertwined. We separate the two to aid understanding.
The following sections detail the overall analysis process as follows. First, we describe the preliminary procedure to formulate the problem you are trying to solve. Second, we explore each of the two processes or types of reasoning and the method of choosing the appropriate one for the problem. Third, we list the analysis dos and don'ts. Fourth, we introduce several methodologies and describe when they are most useful.
The preliminary procedure helps lay the foundation of the analysis and mitigates the likelihood you will waste time and resources later. The procedure consists of four steps that we describe next and also summarize in Figure 3.1.
The very first step when conducting analysis involves figuring out what you want to analyze and why; in other words, formulating a problem of interest. In most cases involving security operations and analysis, formulating your problem of interest is not difficult. Based on your mission and role, a third party, commander, or boss has probably tasked you with either discovering more about something or someone and/or forecasting what they might do in the future. See Table 3.1 for sample problems.
The key is ensuring that your problem of interest is narrow and specific enough to generate an analysis that is feasible and results that are accurate. The following guidelines will help you narrow the problem:
After adequately formulating and focusing the problem, check the Internet to see if others have tackled a similar problem (scholar.google.com is a good resource). Being smart and lazy is the way to go. If others have paved the way intelligently and honestly, feel free to learn from their journey. You might stumble upon a methodology or a ready solution that will save you time and resources. Of course, do not outright steal or plagiarize the work of others. Check with the corresponding author to see how much you can borrow—he or she may be flattered and happy to help.
If your Internet search did not turn up the solution, begin assembling relevant data, which includes social media data and even other data such as weather information, stock market indices, and demographic information. Chapter 4 is dedicated to determining what constitutes relevant data and collecting and managing social media data. Due to the inconsistent nature of social media data, the type and amount of available data will vary considerably. Collecting and thus getting a sense of the data available this early in the analytical procedure may drive the choice of analytical process and methodology.
Formal and academic guides to analysis usually suggest that you formulate the hypothesis before you start assembling data. A hypothesis is an educated guess about the likely solution to the problem. This advice should not go unheeded because a hypothesis can help focus your study. If you can formulate a strong hypothesis based on existing studies and theories, you have a better sense of the type of data you need to collect and the types of analytical methodologies you need to apply. Do understand that this is a much more effortful and different process than formulating a hypothesis based on your unexamined biases or “gut.” Later paragraphs explain how and why.
However, first you should understand that in the emerging field of social media analytics and based on the complex and unique problems you are likely trying to solve, the advice is not always practical and in some cases even harmful. A misguided hypothesis can introduce bias into your analysis and lead you to ignore certain data. You may then miss out on a range of solutions you could not even have imagined. Consider that you want to identify the most influential person on Twitter in regards to spreading violent neo-Nazi propaganda. You select to study only one group of neo-Nazis, because the group is routinely mentioned in the news media as an example of a neo-Nazi group. Keep in mind that popularity in the news media is not necessarily equal to influence within the specific population. You then miss out on the influence of other groups and how individuals in those groups may be influencing the group you choose to study.
Generally, you should formulate a hypothesis if three criteria are fulfilled. One, the problem is not that unique and others have successfully attacked it or similar ones. Your hypothesis will then likely be a version of their solution. Two, established and tested theories exist that help describe the types of human behaviors in question. For instance, social networks tend to follow certain rules regardless of the context. Apply the theory to the problem to create a hypothesis that is grounded in experimental evidence. Three, the physical, virtual, and temporal factors constituting the problem's environment are so limited that the menu of likely solutions is small. For example, environmental factors may make it so that an event will certainly take place either tomorrow or next week. Based on your experience, knowledge of similar past events, and cues, you can then reasonably guess when the event may happen.
We caution against using hypothesis in our case for several reasons. One, few people have published rigorous studies assessing the types of security problems and social media data most relevant to you. Two, social media analytics is far from formalized and few proven theories exist that adequately and reliably explain behavior involving social media. Behaviors that describe human interaction and behavior offline may not always translate to interaction and behavior on social media. Three, the likely relevant environmental factors are not limiting enough to limit the menu of likely solutions. In fact, the environmental factors are probably relatively more numerous because the data and problems likely involve the behavior of many individuals that live in many different parts of the world, and use many different social media platforms.
After formulating or bypassing the hypothesis and collecting the data, the preliminary procedure winds to a close. The next step is determining the type of reasoning or process most appropriate for conducting the analysis. In reality, the line between the preliminary procedure and determining the appropriate analytical process is blurry. Well into your own analysis, you may stumble onto evidence or theories that provide you with an adequate solution quickly or further focus your problem. Or, you start out with a hypothesis and then ignore it if the analysis leads you down unimagined roads. Or, you will likely need to collect more data to complete the analysis. Be intellectually honest with yourself and flexible, and the blurry line will cease to be an issue.
Determining which process is most adequate for your needs depends on the problem, other available work on the problem, and the amount of available data. Each process has different requirements, and provides distinct advantages and disadvantages. Understanding the concepts underlying each reasoning process will help you determine the ideal process and methodology. The most realistic option is to combine the different types of reasoning and modify the combination of processes to fit your needs.
Inductive reasoning or induction is a bottom-up, data-driven approach that involves identifying patterns in data and then codifying the patterns into theories that can also explain other data.
Inductive reasoning is an exploratory process that provides you with insights when you have little idea about possible solutions to your problem. When you fail to generate a hypothesis according to the aforementioned criteria, you need to use the inductive process. Analyzing social media data entails using a lot of induction. As we mentioned before, the study of social media data, and the effect of social media on behavior and vice versa is fairly new. The study of social media and its relationship with security-relevant behavior is even more nascent. You will likely find few established theories that will give you an idea about the solution to your problem. Figure 3.2 outlines the induction process.
First, gather a substantial sample of data or observations concerning the problem of interest. Try to collect a large enough sample size of observations. However, do not become obsessed with collecting data. More data does not always give you more knowledge. Chapter 4 briefly covers how to determine the correct size of an adequate data set using statistical tools. However, do not worry if you cannot meet all statistical requirements. Usually, external factors will limit the size of your data set. The likely issue will be the lack of data, not its overabundance. Also, you may not have the time and resources to collect data and meet all the statistical requirements.
After you have the sample, use a number of statistical tools to apply the methodologies. We describe this process and the relevant tools in detail later. The methodologies and tools will help you pick out patterns in the data that, if strong enough, you can develop into rules that describe past, present, and future human behavior.
Lastly, codify the rules into a theory that elegantly describes a solution to your problem and general behavioral rules that could help solve other similar problems. Test the theory on other data sets and on other problems to make sure your theory is sound and applicable. Even if the theory holds true only for your data set and problem, do not despair. You may have discovered something unique about that data, which is a theory in its own right.
There is nothing inherently bad about induction. However, misguided or overzealous use of induction can lead you down the wrong path. Incorrectly applying inductive methodologies to data will reveal incorrect patterns that lead to the development of incorrect theories and solutions. Also, induction will not work if you do not have an adequate amount of data. If your data consists of only a few samples, many of which may be outliers or unrepresentative data points, then induction will hurt more than help.
We already introduced deductive reasoning in the induction case study. Step 7 in the case study is an example of deduction.
Deductive reasoning or deduction is a top-down, theory-driven approach that involves applying established theories and well-developed hypotheses on data to test the validity of the theory and hypothesis.
Deductive reasoning is a more formal and focused process that helps you confirm if your educated guess about the likely solution to the problem is valid. When you can generate a hypothesis according to the aforementioned criteria, you should use the deductive process. You may still use induction, but deduction will save you time and effort by focusing your analysis on examining only a few possible solutions. In a few cases, deduction will be applicable. For example, many new studies confirm that theories that govern how the social networks of humans develop offline can in some specific cases explain how social networks develop on social media. Other studies confirm that how ideas spread through social networks is partly independent of the mode of communication. Assembling a comprehensive list of existing theories that apply to social media is difficult. You will have to do your own research, especially because it is heavily dependent on your problem. Figure 3.3 outlines the deductive process.
First, determine which existing theories, studies, and solutions to similar problems are the most relevant to your problem. Evaluate them to ensure they are analytically sound and applicable. They are applicable if they consider populations, behaviors, and environmental factors similar to those in your problem, and they suggest their rules and results are applicable to other problems and data sets. Compare competing theories if applicable, and formulate a hypothesis.
Next, gather data necessary to prove or nullify your hypothesis. In some cases, deduction requires gathering less data than induction. If substantial literature and evidence backs up your hypothesis, you can be reasonably confident that you only need to collect data about items and factors your hypothesis considers. However, if you have the time and resources, collect other data. Later, you can apply the inductive process and corresponding methodologies to the extra data to ensure you did not miss anything.
After you have enough data, use a number of statistical and other tools to apply the methodologies. We describe this process and the relevant tools in detail later. The methodologies and tools will help you evaluate whether the theories that inspired your hypothesis are applicable, and the extent to which your hypothesis is valid.
Lastly, determine the solution to your problem by refining or junking the hypothesis and applicable theories on the basis of the analysis' results. You may discover that only parts of the hypothesis are valid as a solution. Likely, you will find that special conditions related only to the items and factors in your problem are required to validate the hypothesis. You may also discover that the hypothesis is completely wrong and what you thought was the solution is not valid, in which case you will need to either apply other hypotheses or apply the inductive process. You may also refine existing theories and make them more elegant and simple, which you may further validate through other analyses or leave others to do it.
Deduction is most appropriate when you can generate a well-informed and specific hypothesis. However, executing a deductive process with a poorly developed or weak hypothesis will lead to an invalidation of the hypothesis and frustration, and a waste of time and effort. You will then have to start all over again. As a general rule, use deduction only when your hypothesis is strong, or when you cannot collect enough data to do induction.
You probably noticed that the final steps of each reasoning process involve applying parts of the other reasoning process. Few social media and security analyses are clearly defined as requiring either induction or deduction. Most require variants of both, sometimes in the middle of the analytical process and often at the end. As you conduct more analyses, you will learn when to utilize a reasoning process. There is no right answer because it is a cyclical process. Given time and resources, use induction constantly to discover insights and deduction to test them. Your analysis will be much stronger for it and you will develop your own analytical tools that you can deploy quickly in the future or use to educate others. Figure 3.4 graphically illustrates this process.
Before learning the different methodologies, it is necessary to adopt good analysis habits. Many of you are conducting analyses to support sensitive and dangerous operations. Adopting good habits and taking care to avoid pitfalls will drastically improve the reliability and integrity of your analysis. Think of abiding by the analysis dos and don'ts as insurance against charges of incompetence or sloppiness. Also, you will help develop the field of social media analytics by creating and sharing well-done analyses.
The following is a list of analysis dos. In an ideal world, you should abide by all of them, but in the real world where resources and time are limited, we encourage you to at least try. The act of trying will put the quality of your analysis head-and-shoulders over much of existing “analysis.”
As is often the case, the list of ways to do something wrong is a lot longer than the ways to do something right. The following is a list of analysis don'ts. Do your best to avoid them to ensure your analysis is intellectually honest, stands up to scrutiny, and provides you with accurate insights. Resource and data constraints may compel you to fall prey to an analysis mistake. Still, being aware of your analysis' faults will help you prepare for possible eventualities and negative fallout.
We provide only the most relevant dos and don'ts. Several more are available and we encourage you to read the book we cite in the notes and other books and articles on analysis so you can continue to improve your analysis.
Methodologies are the series of steps that will allow you to apply the two reasoning processes on the data. In this section, we introduce four methodologies that are the most pertinent for social media analysis and ones we use the most. Chapters 5 and 6 describe them in greater detail and illustrate how to use them to solve security-related problem sets. To solve complex problems, you will need to combine the methodologies. Keep in mind that social media analysis is in no way limited to only these four methodologies. Therefore, we also briefly touch on other methodologies and recommend you adopt various other methodologies as you see fit. Be creative and flexible.
So far, the information we have covered, such as the preliminary procedure and dos and don'ts, applies regardless of the type of methodologies you employ. Information from now on will apply only to specific methodologies with the exception of a brief overview of variables, a key component of all methodologies. If you are familiar with conducting analyses you can skip the Variables section.
Variables are symbols that represent a variety of quantitative or qualitative values. Anything that varies can be a variable. For example, the various types of illegal drugs can be variables, or the age of the top Facebook users in Africa. Analysis involves manipulating and comparing variables at different times in different situations to tease out patterns, causal effects, and insights. Understanding variables and how they differ will help you formulate meaningful analyses. Every robust analysis has the following three types of variables:
If the variable types are new or confusing, then consider the following example.
The following sections briefly introduce the methodologies you will learn to use to analyze social media and related data.
Social network analysis (SNA), a type of network analysis, is the study of the social structure known as social networks comprised of individuals and their relationships. A social network can consist of the relationship between two people or of the relationships between everyone on Earth. See Figure 3.5 for an example of a social network. Because social media is all about creating and sustaining social networks and relationships between people, understanding SNA is essential to understanding social media.
SNA emerged from the interaction of disparate fields including psychology, graph theory, and statistics, and forms part of the emerging field of network science. Several network science theories, some of which contradict each other, describe how social networks behave. They sacrifice the importance of a specific individual and the attributes that describe the individual to generalize about how typical human relationships function. The theories exist as algorithms that conduct specialized mathematics on social network data and output specific answers. Therefore, you can consider SNA to be a somewhat deductive process, yet one that also draws from induction.
SNA enables you to map, measure, and describe almost anything about a social network and its components. SNA can provide information about individuals, a few relationships, or large-scale networks. You can use SNA to understand the ideas of interest of social networks, how individuals gain influence in social networks, how individuals form relationships with others, how the relationships evolve over time, and how the relationships affect the behavior of individuals in the social network. You can also use SNA to determine which individuals are the most influential, which individuals are the most vocal, which relationships are the most influential, and which relationships are necessary to sustain the structure of the network. SNA also enables you to measure the relationships between different types of social networks. For example, with SNA you can measure whether and how an individual's social network on Badoo influences his or her social network in the physical world (friends, family, and so on) and vice versa. SNA is very useful for understanding how violent extremists use social media to develop relationships with at-risk populations, forecasting how the social networks of human traffickers and narcotics smugglers evolve over time, identifying the key individuals and relationships in drug trafficking networks, and much more.
The rise of social media and especially social networking platforms has produced gargantuan amounts of data about social networks. Now, it is easy to find sample social network data on the Internet, which researchers usually derive from Facebook, to test network science theories. The explosion of readily available social network data and sophisticated SNA software preloaded with algorithms is producing the golden age of network science. Many theories underlying SNA are now being put to the test like never before, and several companies are creating automated tools to analyze social networks exhibited on social media platforms.
Chapter 5 delineates how to conduct SNA. However, the following describes the overall SNA process:
Language and sentiment analysis (LSA) is the study of patterns in linguistic content, such as Facebook status updates and text messages. It is the application of theories and tools from fields including text analytics, computational linguistics, statistics, and natural language processing (NLP). LSA is actually an umbrella term we use to address a variety of language processing tools and analyses. The various language tools and analyses help identify what individuals and groups are saying, why they are saying it, who is saying it, what they mean exactly by what they are saying, and how they feel about what they are talking about. As with SNA, the explosion of social media data consisting of billions of tweets, status updates, private messages, and texts from people globally has significantly bolstered the use of LSA. Because social media is all about communication, and much of the communication is text-based, understanding LSA is essential to understanding the meaning of communication. LSA is very useful for geolocating and understanding texts from victims of humanitarian crises, determining the identity and likely location of a suspicious blog's author, forecasting the coordination of criminal activity such as planning of violent riots, and much more.
LSA involves both inductive and deductive processes. Many existing LSA theories and algorithms process language and output answers with varying reliability. While most LSA tools focus on processing the English language, few process non-Western and less mainstream languages such as Swahili. Existing LSA tools also have difficulty processing unstructured data with lots of slang, idioms, and sarcasm such as the tweets of teenagers, although advances are being made rapidly. You will use both existing LSA tools and algorithms and create your own through deduction and induction.
Chapter 6 delineates how to conduct LSA. However, the following describes the overall LSA process:
Correlation and regression analysis (CRA) is the study of correlations and/or relationships between anything, or more accurately, between a dependent variable and one or more independent variables. CRA uncovers correlations between seemingly separate things and enables you to determine whether and to what extent the two things either directly or indirectly affect each other, or whether a third thing affects both simultaneously or concurrently. CRA is usually the first type of analysis one learns in a basic statistics course and is used in virtually every field.
To be more precise, correlation and regression analysis are different types of analyses. You use correlation analysis when you simply want to find associative relationships between distinct things, regardless of if they affect each other. For example, there is a correlation between the sale of winter coats and gloves. When there is a spike in sales of winter coats, there tends to be a spike in the sale of gloves. Note that correlation does not imply causation. It is not that the purchase of a winter coat causes or affects a buyer to purchase gloves or vice versa. It is that a third thing, cold weather, causes the buyer to purchase both a winter coat and gloves.
You use regression analysis when you want to uncover predictive relationships between two things, regardless if they cause each other. By a predictive relationship, we mean that the change in value of one variable can help you forecast how the other variable will change. For example, you can do regression analyses on data about the sales of winter coats and gloves. You can then create an equation that tells you that a 15% increase in the sale of winter gloves tends to be associated with a 20% increase in coats. Regression analysis also tells you about the accuracy of your predictive equation.
CRA is an essential tool for understanding social media data and insights hidden therein. Expect to regularly use CRA to support other methodologies and to uncover correlations and causal relationships between various things. In the case of CRA, think of social media primarily as the vehicle for delivering data that enables you to uncover the intuitive and counterintuitive correlations and relationships between disparate things. CRA is very useful for establishing correlation relationships between microeconomic indicators and weather data to forecast famines, establishing causal relationships between specific messages on social media and violent acts on the ground, determining the existence of causal relationships between drug smuggling activity in rural areas and the momentary sentiment of rural populations, and much more.
Chapter 6 delineates how to conduct CRA and further discusses how correlation analysis is different from regression analysis. However, the following describes the overall CRA process:
Volumetric analysis (VA) is a type of CRA that focuses only on discovering associative and predictive relationships between events or behavior and changes in the volume of traffic or activity on social media platforms. Data traffic and activity on social media platforms, hereafter known as data volume, includes the number of tweets in a day that mention a specific word to the number of texts from a specific location. Information about data volume is often easier to collect than content data. Social media platforms like to make such data available to showcase their popularity and success.
We separate out VA from CRA because of the powerful insights that analyzing data traffic and activity can reveal about security-related issues. Examining changes in data volume and focusing on unique spikes or drops in data volume often reveal the presence of unique events. In the security world, unique events are the most important events because civil wars, terrorist bombings, and natural disasters are not everyday occurrences. VA is very useful for identifying sharp drops in social media activity from a specific location to uncover security threats in the area, correlating spikes in communication between two countries with a spike in drug smuggling activity in the countries, and much more.
Chapter 6 delineates how to conduct VA. However, the following describes the overall VA process:
We encourage you to frequently discover and use other types of analyses on social media data. A great way to discover interesting and powerful analytical methodologies is to explore other fields such as physics, biology, and ecology. In fact, we started using VA after reading about how ecologists measured changes in population volumes to uncover unique ecological events. VA is actually a term we borrowed from chemistry. Researchers studying the presence of alien life are using VA to determine changes in the composition of chemicals to identify the presence of life.
To gain a better understanding of SNA and to come across more powerful SNA algorithms, read about the application of network science on other fields. We stumbled onto SNA after reading about neural networks and how network science helps explain the influence of relationships on neurons and vice versa. Likewise, we started to appreciate the power of CRA by reading about complexity science, an emerging field that involves examining the behavior and structure of complex systems. Geoffrey West, a prominent physicist and complexity scientist, has used variations of CRA to uncover startling and interesting rules between population size, gross domestic product (GDP), income, patents filed, and crime rates for any city, be it in the U.S., China, Europe, or elsewhere.2 Simply put, West found that if you double the size of a city, you get a 15 percent increase in any of the aforementioned areas, as well as many others. West's findings highlight the underlying strength and influence of citywide social networks.
If your time to study other fields is limited, fear not. Chapter 12 briefly explores exciting but more complex analytical methodologies and tools that you can apply to social media data. They include:
But before you start exploring more advanced methodologies, you need to learn the more basic ones. Chapter 5 continues your education in analysis by teaching you how to conduct social network analysis.
1. Tetlock, P. (2006) Expert Political Judgment: How Good Is It? How Can We Know? Princeton University Press, Princeton, NJ.
2. West, G. (2011) “The Surprising Math of Cities and Corporations.” TED. Accessed: 24 May 2012. http://www.ted.com/talks/geoffrey_west_the_surprising_math_of_cities_and_corporations.html