Management does not have time to sort through mass amounts of data to make business decisions. Today it is critical for data to be presented in a meaningful way to be understood to be useful. This chapter is an introduction to visualizations to present the data in a story format for better business decisions and results.
Data visualization; Data to insights; Context; Data source; Data format; Data Quality; Interactive visualization; Dimensions; Dashboard; Visualization audience
Data is necessary for business. When data is put into a visualization, they can be used to tell a story. Telling a story is a sequence of events that can show past, present, and future. Using visualizations to tell a story about data can show patterns, trends, and relationships to focus on what is important. A visualization can also be used to enable discovery of new information by making it easier to understand the data and the different dimensions hidden in the data. Visualizations are changing the way we tell a story about the data to provide better information, knowledge, and insights.
Fig. 18.1.1 shows where a visualization can help along the path to gaining deeper insights in the data for decisions and actions that provide business value.
Data can take opinions and turn them into facts. Fact-based decision-making requires not only good data but also the ability to turn the data into useful information and knowledge through analysis. Many decision-makers are not in the role of analyzing or working with data; they require data to be presented in a format for them to make decisions. Visualizations allow data to be explored easily or to be presented in a way where they can be better understood for decisions to be made. When visualizations are created correctly, good decisions can be made that provide business value.
A good visualization is one that is understood by the audience and meets the purpose for creating the visualization. The purpose may be to show analytic results to executive management, to show social media trends to educate the public, or to show findings in the data to improve business performance. If the purpose is to improve business performance, a dashboard can be created through the use of visualizations. The outcome of a good visualization will be improved collaboration, decisions, and actionable insights.
Visualizations can be used to explore the data or to tell a story. They can be simple or more complex depending on the story to be told and the data available. Selecting the appropriate forms for the story to be told is important to make it easy to understand by the audience. It is also important to add additional details to the visualization such as labels to ensure it is interpreted correctly by the audience and the proper context is understood. Fig. 18.1.2 is an example of a bad visualization due to missing titles. Although this visualization includes a scale, it is not clear what is being compared and can be misinterpreted or misleading.
Telling a story about the data is both a science and an art. Selecting the right colors in the visualization can have an impact on how the story is interpreted. Certain colors such as reds and greens should be avoided as indicators as readers who are color-blind may not tell the difference in the colors. Both shapes and colors should be used. Colors may relate directly to a business, or certain colors can trigger different emotions in a visualization. Blues can have a calming effect and greens can have a feeling of safety, where reds can generate danger or energy and purple can generate feelings of power or luxury. Shapes, colors, different color hues, and appropriate font sizes should be considered to present the story in a way that the audience can easily understand it in the right context. Too much information should also be avoided, so the story presented in the visualization is clear and not cluttered as shown in Fig. 18.1.3.
A framework or methodology should be used to create a visualization that is interpreted in a way that brings value to the audience. Too often, a developer has a story in mind, but without using a clear methodology, the solution is not interpreted in a meaningful way or with the right context. Poor decisions can be made if the context is not clear or if the purpose for the visualization was not well defined in the beginning. Fig. 18.1.4 shows a framework that is easy and effective to use when creating a visualization.
The first step to create a good visualization is to define what problem needs to be better understood through analyzing and presenting the data in a visualization solution. This step involves understanding the purpose for the visualization, and who will have access to view or interact with the visualization when it is complete. Different roles in the organization may understand or use the results in different ways. Table 18.1.1 shows some examples of different roles to consider before creating a visualization to understand the business need or purpose.
Table 18.1.1
Role | Purpose or Action |
---|---|
Executive | Corporate strategy decisions |
Chief data officer (CDO) | Influence corporate strategy and define data management strategy |
Business manager | Understand performance |
Data analyst | Immediate response |
Customers or prospects | Inform or educate |
The define step also considers the purpose for the visualization to meet the needs of the audience. Will the visualization be used to inform or educate the audience, or will it be used to influence a decision? Is there an immediate problem to be solved, or is the purpose to explore the data to provide more insights for strategic decisions? To answer these questions, it will be important for the visualization designer to meet with the audience to understand the business need or purpose at the beginning.
The second step to create a good visualization is to understand the data to be used for the visualization. Creating a visualization should be relative to the purpose as defined in the first step. Understanding what type of data is available, how much data are available, and if the data available can tell the right story through a visualization is also important.
When it comes to visualizations, data can be categorized into different types. The most common groupings are known as structured or unstructured. When data are put into a workable format, such as a table with rows and columns or a database, it is considered structured. Unstructured data include data that do not fit into a standard workable format and may include data such as text or comments. When working with unstructured data to create a visualization, additional work may be needed first to put the data into a workable format.
Too often, companies are data-rich but information-poor. This is usually the case when there are a lot of data, but they reside in many different places and don’t integrate well to be useful. For example, data may be in a spreadsheet, text file, or database. To create a visualization, data can be gathered from many different sources, but it's important to understand how the different data sets may be related. Not all data gathered may be used or important that can be determined when creating the visualization. Data sources may be internal to the company or external, such as publicly available data. Depending on the visualization software used, there might be additional data provided to enrich the visualization such as maps. An example of using public review data from the Internet and combining it with Maps using Qlik Sense1 is shown in Fig. 18.1.5. This example shows higher volumes of data by location on a map using bubble size.
Other examples of data sources include the following:
The data must be organized to create a visualization. This means data must be put into a workable format. Most tools for creating a visualization provide detailed information how to manage data in the application or how to connect different data sources. Best practice will require the data to be organized into a rows and column or table format. Each value in the table should be the same unit of measure. For example, Table 18.1.2 shows airline flight data in a row and column format and having the same unit of measure. When dealing with time data, the time format must also be consistent. For example, dates should be in a consistent format such as MMDDYYYY to be visualized correctly.
Table 18.1.2
Year | Airline | Domestic Flights | International Flights | Total Flights |
---|---|---|---|---|
2017 | Southwest | 1,313,573 | 34,308 | 1,347,881 |
2017 | American Airlines | 886,803 | 193,145 | 1,079,948 |
2017 | Delta | 917,231 | 144,295 | 1,061,526 |
2017 | United | 580,293 | 167,578 | 747,871 |
2017 | JetBlue | 291,995 | 62,369 | 354,364 |
Depending on the story to tell, skills and knowledge in statistics may be needed. More complicated visualizations can use calculations to show the results of the analysis. Although visualizations can tell a story using good data, they can also be used to distort reality by presenting the data in different ways. When using line or bar charts, use caution not to distort the data by truncating the bottom of the line or bar chart where differences between the data points appear larger. Also, use caution with scales, such as different size bubbles to ensure they are at the correct scale for comparisons.
Data quality is important for a good visualization. Good data include data that are complete, clean, not questionable or conflicting, and valid. Quality data can lead to better decisions and better visualizations. There are different dimensions of data quality to be considered including the following:
Data can be collected from many different places. Before designing a visualization, it's important to understand the data that will be used. Data can be structured, such as a customer name and location, or unstructured, such as a customer comment or phone call transcribed to text. When collecting the data, it's important to understand how different data sets are related. For example, if structured customer data and unstructured customer comments are going to be used, then how are they related? What will be communicated through a visualization and what kind of story will be told? By understanding these questions, then the right type of visualization can be used.
The concept of using a visualization to represent data has been around for hundreds of years. Today, with the advancements in technology and business intelligence (BI) technology capabilities, there are many tools available to help create a visualization. Technology has made it possible to process high amounts of data quickly. Technology may continue to advance capabilities to create a visualization—perhaps through audio describing what a user wants to see or through machine learning. No matter where we are going with the creation of a visualization, there are fundamentals that are important to understand. When it comes to design, the most important fundamental is to ensure the context of the visualization is understood by the user. Before the design step, it's important to have followed the methodology and have the define and the data steps understood. Choosing the appropriate chart requires an understanding of the data properties and purpose for the visualization.
When the business need or problem is understood and the data have been gathered, the visualization can be designed. There are many different forms of visualizations that can be used depending on the data, but choosing the right visualization to improve the user experience in telling the story is important. All visualizations should include not only the visual that represents the data but also additional information such as labels and text so the audience can understand the content and the context. Table 18.1.3 shows some basic forms of visualizations that can be used. Some of these charts can be enhanced; for example, a time element can be used for a bubble chart to show changes over time. Examples for some common basic charts will be discussed. However, there are many different forms of visualizations that should be reviewed before designing a visualization.
Table 18.1.3
Visualization Form | Number of Categories | Number of Numerical Variables | Purpose | Audience Ease of Interpretation | Example |
---|---|---|---|---|---|
Number chart | 1 | Display | Easy | Average rating or score | |
Pie chart | 1 | 1 | Proportion comparison | Easy | % of negative sentiment by company |
Bar chart (basic) | 1 | 1 | Showing exact values | Easy | Top consumer complaints about Equifax in a given period of time |
Bar chart (grouped side by side) | Multiple | 1 or 2 | Compare categories | Easy | Compare hotels grouped by hotel ratings |
Bar chart (Stacked) | Multiple | 1 | Compare categories | Easy | Compare hotels by on line customer review sentiment |
Line (single) | 1 | 1 + Date variable | Trends over time | Easy | Sales over time |
Line (multiple) | Multiple | 1 + Date variable | Compare multiple categories over time | Difficult | Consumer sentiment over time for each credit bureau |
Maps | Multiple | Multiple | Comparing variables and geospatial analytics | Difficult | Location and volume of customer complaints |
Scatter chart | 0 or 1 | 2 | Relationships and correlations between numerical values | Difficult | Relationship between cancer rates and country |
Bubble chart | 0 or 1 | 3 | Relationships and correlations between numerical values | Difficult | Comparing airlines by assets, revenue, and profit |
The most common visualization is a simple number chart. A number chart as shown in Fig. 18.1.6 is a good visual for a dashboard to easily communicate any total such as a count, a percentage, an average, or a dollar amount. Trend indicators can also be used in a number chart but should represent the same period of time (such as annual, quarterly, daily, or monthly).
Pie charts have been around for hundreds of years to show parts of a total relationship over a static period of time (such as a slice of the pie vs. the whole pie). Pie charts are a simple way to visualize simple comparisons for a single category; however, they do not work well to compare the size or segment across multiple pie charts. A pie chart splits a population of data for a single category into segments, and the total of all the segments equals 100%. If there are too many segments, then pie charts do not work well as they can be difficult to label or to show the difference in proportions. Also, a pie chart can take a lot of space on a dashboard or report. Fig. 18.1.7 shows an example of a pie chart where the category is ratings for a hotel. Ratings are segmented 1 through 5, and the pie chart shows the percentage of each segment.
A bar chart is used for comparison ranking across one or multiple categories. There are different types of bar charts, and choosing the best one will depend on the data available. A simple bar chart is easy to interpret and can be used to show totals or trends for a single category. Fig. 18.1.8 shows an example of a simple bar chart.
A stacked bar chart can be used to show totals for a single category or to compare categories when there is more than one. For example, Fig. 18.1.9 shows the number of scheduled flights in 2017 for US airlines by domestic and international in one stacked bar chart using public. Stacked bar charts are great to show survey responses or any type of data that has multiple categories.
A horizontal bar chart works well if the category labels are long. Although the data presented are similar to the simple or stacked bar chart, using the horizontal bar chart may be selected to better display the labels or for sizing depending on where it will be displayed. A horizontal bar chart may be chosen over the other types of bar charts to better tell the story with the data available (Fig. 18.1.10).
Another basic form of visualizing data is using a line chart. Line charts require time data in consistent intervals. Fig. 18.1.11 shows an example of a multiple line chart where there are multiple categories plotted over time. The variable being plotted is customer sentiment for three different companies. This type of chart is not good for a static visualization, such as a PowerPoint presentation as it can be too cluttered. However, using a visualization tool such as Qlik Sense, the audience can interact and select a custom time range that will allow the user to drill down to see more details. This chart combined with others in an interactive visualization can be very powerful for exploring the data to tell a story.
Another type of chart to compare different variables is a bubble chart or scatterplot. A bubble chart is a good visualization to show in a 3-D format, but it is more complicated and requires more skill to create. Different colors or bubble sizes can be used to show a lot of information in a single chart. A bubble chart looks at data in a snapshot of time. However, by plotting different snapshots of data over different periods of time, this chart can become animated to show changes through data in an interesting form.
A data visualization is a way to tell a story through a graphic representation of the data and a way to share the story among both technical and nontechnical people. The last step when the visualization is complete is to distribute the visualization. There are many ways a visualization can be shared or distributed. It's important to consider this step before you design your visualization as the purpose will define how it should be distributed. Is the purpose for the visualization to inform or to allow data discovery? Will your audience view only or interact with the visualization to discover insights?
To inform or educate the audience, the story should unfold by showing the data and visualizations in an order that tells the story. For example, if data are collected to understand customer sentiment about their hotel stay, a visualization can be created to show the customer sentiment over time and put into a story format. Consider using data from the past, present, and predicted future to tell stories for the best outcome or decisions.
Visualizations can be shared or distributed to inform or educate the audience in different ways that may include the following.
If the purpose for the visualization is to explore the data, then an interactive visualization can be valuable. To distribute an interactive visualization will depend on the software used. Most visualization tools have the capability to publish the visualization to the Internet (cloud) so the user can interact and explore the data. With defined user permissions, the user can change different variables, while all charts in the story update. Interactive visualizations are great for the user to do “what if” questions and to see the outcome visually.
The practice of creating visualizations is rapidly growing just as machine learning, digital facial recognition, unstructured data analytics, and data science are growing. There are many smart and user-friendly tools available for creating visualizations. Selecting the appropriate tool will depend on many factors, including the knowledge, skills, and abilities of the visualization producer. Some features to consider when selecting a tool include the following:
Here are some of the leading tools on the market today for creating visualizations without requiring detailed programming skills:
There is great value in the process of creating and telling a story through visualizations. The visualization framework is the best methodology to use to ensure visualizations are created with the right content and can be understood in the right context. The process of defining the purpose and talking with the audience, collecting the data, designing the visualization in a story format, and distributing the visualization allows data to be more easily understood for the audience to focus on what is important. Using visualizations to tell a story through data is a great way to provide better information, knowledge, and insights. Telling a story through visualizations will continue to be necessary moving forward to enable data to be better understood for more accurate outcomes and decisions.