16
Advanced Reporting and Visualization with Third-Party Tools

In this chapter, we turn our focus to advanced reporting techniques, including data extraction and cleanup, as well as reporting automation. We’ll also continue our discussion of data integration from Chapter 15.

To put it bluntly, advanced reporting—and report automation specifically—can be a bit of a bird’s nest. Advanced reporting is a series of sometimes very manual steps that combine to form a report/chart/dashboard or enable deeper ad hoc analysis. The end product, which may look simple and elegant, is the outcome of a complex process.

With all the data being collected and all the different segmentation features available in GA, you would think that reporting and dashboarding data would be a problem solved long ago. Advanced reporting, however, remains one of the biggest challenges facing organizations today. Certainly, GA (among other tools) makes it easy to construct a basic report. Designing a reporting solution, however, that takes data sampling and automation into consideration is an entirely different matter. An effective report has the following characteristics:

  • It takes data accuracy into consideration (no sampling).
  • It has an elegant and simple presentation.
  • It is automated so it doesn’t require hours and hours of construction time.
  • It delivers at strategic intervals in an easy-to consume format.
  • Most importantly, it provides business insight.

Producing a report or dashboard that excels in all the above areas can be a daunting obstacle to overcome for many organizations. Analysts often spend countless hours manually copying data from various analytics tools, pasting it into Excel or PowerPoint, cleaning it up, and then building an unusable and unattractive report, which is then emailed to stakeholders, usually without providing additional context about the data.

The reality is that it takes a significant amount of time and effort to work with the data contained in analytics tools, extract it into a format we can work with, make it report-friendly so that the report viewers will understand it, and then share it with our users through automation. Remember the time you put together a report of your Top 10 Landing Pages and the number 1 viewed page was “/”? The analysts, of course, know that “/” represents your home page, but this isn’t something that most people, particularly executives, will know or even should be expected to know. If you have not already rewritten “/” in your GA view settings as described in Chapter 9, rewriting it now to a more report-friendly name that management will easily be able to interpret has a profound impact. It’s our responsibility as architects of the reporting solution to construct the report in a way that is easy to consume for its intended audience. Sounds simple enough, right? Well, it kind of is, but combine this with the many other similar data inconsistencies that exist, and it’s not so straightforward to do this in an automated way.

At a recent conference on sports analytics, one of the speakers was the Director of Analytics for an NBA (the U.S. National Basketball Association) team. In addition to offering many valuable analytics insights, he said, “What do you think I spend 90% of my time doing? I’m basically cleaning up data in spreadsheets.” That really cemented for me just how important it is for us to leverage tools that can automate as much of the data extraction, cleanup, and visualization process as possible. Analysts shouldn’t be spending their time copying and pasting data, renaming data, and building graphs and charts. Analysts should spend their time mining for insights and discovering trends and patterns that can impact the business.

In this chapter, we cover how to get data out of GA, the limitations of each method, and the factors to consider along the way to help guide us through the maze. At the end of the chapter we discuss three advanced use cases for extracting and visualizing GA data.

Framing the Issue: How to Get Data Out of GA

There are several ways of getting data out of GA. We’ll focus on the methods that are most effective in supporting our goal of automated reporting.

Core Reporting API

The GA Core Reporting API (https://developers.google.com/analytics/devguides/reporting/core/v3/) provides a basic querying system to request metrics and dimensions in the form of tabular data from GA. It can access the majority of report data contained within GA. The returned data will look and feel similar to what you would see within the interface, and enable you to build dashboards, and automate reporting tasks outside of the GA interface. For example, you can leverage the API to extract and integrate GA data into a Web page, or Excel, or any other application. Note that the Core Reporting API is the only automated extraction method available to GA Standard users.

Following is an example of a query to the Core Reporting API. In this example, we are querying for the top 10 Channels during the last 30 days.

https://www.googleapis.com/analytics/v3/data/
ga?ids=ga%3A73156703&start-date=30daysAgo&end-date=yesterday&metrics=ga%3Asessions%2Cga%3AbounceRate&dimensions=ga%3AchannelGrouping&sort=-ga%3Asessions&max-results=10

The result of this query is shown in Figure 16.1.

Table lists the number of sessions and bounce rate for different default channel groupings such as organic search, direct, social, referral, display, email, branded and generic paid search and others.

Figure 16.1 Output of a typical query to the Core Reporting API.

Unsampled Request API

The Unsampled Request API differs from the Core Reporting API in that it allows you to access unsampled data. While the Core Reporting API allows you to dynamically combine metrics and dimensions, the Unsampled Request API instead allows you to access predefined Unsampled reports in comma-separated values (CSV) format.

Third-Party Tools

Analytics Canvas

Analytics Canvas is a tool that can be used to automate the extract, transform, load (ETL) process. It can connect directly to GA via the Core Reporting API as well as the Unsampled Request API, and to GA data that has been stored in BigQuery. It can also connect to many types of databases and to Excel files. Data can then be modified as needed and automatically uploaded to a database of your choice.

The software requires a Windows-based computer that would need to have access to all data sources. You would need to set up Analytics Canvas to connect to, manipulate, and then automatically push data to the desired output location.

Other tools that offer more advanced but conceptually similar functionality within Analytics Canvas include Informatica, SQL Server Integration Services (SSIS), IBM InfoSphere, SAP Data Services, and many others. Analytics Canvas is primarily designed to work with analytics and marketing data sources and is well suited for GA purposes.

Tools such as SSIS are good at extraction and transformation from internal databases, but either don’t offer connections to GA/BigQuery or offer limited functionality (with respect to GA).

Analytics Canvas can connect to the GA data using three methods:

  • Core reporting API
  • Unsampled Request API
  • BigQuery

Not only will Analytics Canvas offer direct connectivity to GA, but since it’s designed primarily to be used for analytics data, it offers the ability to extract unsampled data, something that other ETL tools typically don’t.

For GA Standard users who are restricted to using the Core Reporting API only, Analytics Canvas leverages a creative method of addressing data sampling called Query Partitioning. This effectively segments a query into smaller chunks to reduce the number of sessions within each chunk. For example, if your reporting period were one year, Query Partitioning would split up the query into 12 smaller queries. The process is transparent to the user other than that it takes longer to execute the query. It’s a clever way of mitigating the effects of sampling, but as the word mitigate suggests, it only reduces the effects of sampling; it doesn’t eliminate. Depending on the volume of data being reported on, Query Partitioning may have a major or perhaps a marginal effect on your data.

A sample of a “Canvas” is shown in Figure 16.2.

Block diagram shows processes such as main-roll up, sampling, choose type, product, make lower, summarize error and USB product grouping for mobile.

Figure 16.2 The visual blocks within the canvas make it extremely easy to follow the extraction and transformations occurring within your data set.

BigQuery

BigQuery is a querying tool that leverages the power and speed of Google’s cloud infrastructure to store and query billions of rows of data in seconds. Initially developed as an internal tool to serve Google’s internal technology stack, it works amazingly fast and enables complex data processing and segmentation.

BigQuery is a different product entirely from GA—the two aren’t directly related. However, there is an integration available between Analytics 360 and BigQuery. As part of the integration, GAP exports data to BigQuery on a nightly basis. Not only is the majority of existing GA data available in BigQuery, but BigQuery’s robust structure enables us to see deeper, more granular hit-level data. The additional layer of data enables us to understand and analyze user behavior with respect to sequence and the order in which activities were performed in a session.

BigQuery is one of several storage options within the Google Cloud Platform, as shown in Figure 16.3.

Diagram shows four storage options such as Google cloud storage, cloud SQL, BigQuery and Cloud datastore alongwith their functions .

Figure 16.3 The Google Cloud Platform

Is BigQuery a Relational Database?

Although BigQuery is designed to run SQL-like queries, it cannot be considered a relational database management system. Relational Database Management Systems (RDBMSs) such as Oracle, SQL Server, DB2, or MySQL are designed to efficiently perform all CRUD operations (create, read, update, delete), BigQuery is designed primarily for storage and the read operation—specifically, to run faster queries on extremely large data sets while avoiding the sampling that an RDMBS may apply.

We explore some advanced use cases for BigQuery later in this chapter.

For the datasets, tables, rows, and columns available for GA data exported to BigQuery, see “BigQuery Export Schema” in the GA help docs:

https://support.google.com/analytics/answer/3437719?hl=en

Tableau

Tableau is the recommended tool for visualizing the reports. Tableau is one of the leading data visualization tools on the market today and is extremely powerful at dashboard creation, ad hoc analysis, and the building of a self-service reporting solution.

Tableau has three components:

  • Tableau Reader: This is freely available for download and install. Its only function is to enable the user to view Tableau report files, similar to Adobe Reader for PDF files. If a file-sharing report distribution system is used, end users must install Tableau Reader to open the native-format Tableau files. Tableau Reader is available in both Windows and Mac versions.
  • Tableau Desktop: This is the report-building environment of the Tableau suite. A select group of users who are well trained in Tableau will typically have access to this software and will be able to view the raw data from each data source. Within this environment, report builders can either perform ad hoc analysis, or build reports/dashboards/visualizations to be shared as files (opened by Tableau Reader above) or published to the Web-based Tableau Server (mentioned below). Tableau Desktop is available in both Windows and Mac versions.
  • Tableau Server: The Web-based Tableau Server acts as a publishing environment for Tableau reports. This software would be installed on a central Web-server within your network. Report builders would build reports and publish to this server, to be consumed by end users. End users would log in (in their browser since it is Web-based, eliminating the need for the Tableau Reader) and view reports/dashboards/visualizations as allowed by a role-based permissions model. (Tableau Server is currently available for Windows servers only.)

What Factors Dictate Which Tool to Use?

A considerable factor in devising a reporting solution is resolving the issue of the many disparate data sources that are not able to expose data in a way that allows the data to be pulled automatically. To facilitate an end-to-end automation process, we recommend the acquisition of a middle-layer tool called Analytics Canvas (aka Canvas), described earlier.

With all these tools available to use, it of course gets hard to determine which tool to use. There are several factors that impact the solution we should use, but one of the primary factors is sampling, discussed in Chapter 11. Since sampling in GA directly affects the quality and integrity of your data, Table 16.1 may help in determining which tools to use, or at least in eliminating options.

Table 16.1 A Summary of Available Data Extraction Methods/Tools, and Ideal Uses

Extraction Method Sampling Notes
Core Reporting API Subject to same sampling thresholds as GA Standard interface (500,000 sessions within reporting period). 10,000 rows maximum returned per query. Tools such as Canvas can help mitigate (not eliminate) effects of sampling by using the Query Partitioning Feature.
Unsampled Request API (Analytics 360 only) Provides access to preconfigured unsampled reports. Great way of exporting Unsampled data but limited access makes this a bit cumbersome. Unsampled reports must either be manually downloaded from within the tool (or emailed), or accessed via Google Drive account as CSV files.
BigQuery (Analytics 360 only) Provides access to unsampled hit-level data. Can be used in two models: 1) as a data hub where you can upload additional data sources and join with GA data; 2) as a vehicle for accessing unsampled hit-level GA data.
Analytics Canvas Can connect to Core Reporting API, and Unsampled Request API for fully Unsampled data, as well as BigQuery. Works in tandem with visualization tool. By itself, it only provides a facility to extract and transform data.
Tableau Contains a connector to access GA data but prone to significant sampling issues. Much better solution is to feed data to Tableau via integration with Analytics Canvas, or to use the automated export from Analytics 360 to BigQuery and connect to BigQuery from Tableau.

ETLV–The Full Reporting Automation Cycle

ETL stands for extract, transform, load. It is a BI-oriented process to load data from the source system to the target system to enable business reporting. We added a V at the end to make this process a little more current and complete. The V stands for visualize, of course!

The overall solution will function as follows:

  1. Extract the data.
  2. Transform the data.
  3. Load the data into a reporting platform.
  4. Visualize.

This process is illustrated broadly in Figure 16.5.

Diagram shows the data flow though three blocks such as original data stores, extract and transform and load and visualize. It also shows various components of the blocks.

Figure 16.5 Data flow through ETLV.

There are several factors that need to be considered in the ETLV process to extract/obtain data from heterogeneous data sources, modify (transform) it, and then load it into a data-reporting tool for visualization in an automated way. We’ve broken down some of the factors in Table 16.2.

Table 16.2 Factors at Each Stage of an Automated ETLV Reporting Solution

Factor Considerations
Phase: Extract
Data Sources Typical business may need to pull data from platforms such as Google Analytics, WebTrends, Adobe Site Catalyst, Marketing Automation tools like Marketo or Eloqua, CRM tools like SalesForce, Display Advertising data such as AdWords, DoubleClick, AdMob, E-Commerce Data, App data (from Apple Store or Google Play), or any number of internal databases.
Format What format is the data stored in, and how is it exposed will dictate the method we use to extract the data.
Frequency How often is the data made available or refreshed?
Time Frame Is the data made available in incremental chunks (daily, weekly, or monthly extracts) or as a full extract?
Phase: Transform
Cleanup How much sanitization, data filtration, and renaming (to make data understandable by its intended audience and report-friendly) is needed?
Structure This step accounts for any calculation (e.g., calculating bounce rates, de-duplicating of data, summing/aggregation) or other calculations based on business logic.
Data Integration Are there any common keys available to join data sources together, or facilitate data widening via a lookup or mapping file?
Phase: Load
Frequency How “fresh” does the data need to be? For monthly reports, this typically isn’t a concern, but for daily reports it can pose a big challenge.
Import Type Will data be loaded incrementally? Or will the entire data set be overwritten with each cycle?
Format What is the output of the load process? In other words, where is the data being sent for final reporting? Excel? Tableau? Data Studio 360? Some other reporting or visualization tool?
Phase: Visualization
Governance Who will be viewing the data, and which reports should they have access to?
Distribution Will the reports be shared via email? On a network shared drive? Or will users login to a web-based system and view reports there?
Software Which software have you already invested in to build reports with? Selection of software will depend highly on the reporting requirements but also on the willingness of end users to learn a new tool, and of reporting champions to enforce a particular tool/platform.

Before deciding on a solution or any one tool, it’s recommended that you take a step back and evaluate your organization’s reporting needs. By going through the above table and getting a better understanding of your data sources and how much data cleanup is needed, your organization will be far better equipped to make a solid business decision and build a reporting solution that will deliver insights with the speed, agility, and depth needed.

Advanced Use Cases for BigQuery/Tableau

Let’s take a step away from the architecture of reporting and review some examples of reports that we can build using two tools available to us in the overall ETLV stack: BigQuery and Tableau.

Use Case 1: Path Analysis

As we mentioned earlier, the data available in GA is very aggregate in nature. You can see overall traffic by Campaign, Top Landing pages by Campaign, and so on. You can also drill into Users who performed certain actions on your site or mobile app. What you can’t easily do within the GA interface is determine the order in which users, in aggregate, performed certain actions (even though we can refer to the User Explorer report for the series of actions completed by individual, anonymous users). For example, do the majority of users view a video first or download a PDF document prior to converting on your lead submission goal? The flow reports currently available in GA, while useful, don’t always answer these types of questions about aggregate, hit-level flow through our websites and apps.

Enter BigQuery.

As part of the integration with Analytics 360, the data exposed to BigQuery includes a layer of data that isn’t available within the GA interface. This hit-level data includes time/sequence information so we can do exactly this type of flow analysis. Within BigQuery we can drill into this data for a specific user (based on user ID) or even a specific user session (similar to the User Explorer report in GA).

In order to do this, let’s first find a user who was fairly active. To keep things simple, we’ll just focus on one day of data.

Here is a simple query to find us just the right user session to dive deeper into:

SELECT CONCAT(fullVisitorId, STRING(visitId)) AS userSession,totals.hits FROM [8839142.ga_sessions_20150920] order by totals.hits desc LIMIT 100

The results are shown in Figure 16.6.

Screenshot shows a table listing total hits by ten different  visit IDs. Number of hits listed in descending order as 80, 40, 17, 16, 13, 12, 12, 11, 10 and 9.

Figure 16.6 Results of a simple query to extract total hits by visit ID.

With 80 hits, the session in row 1 shows a high level of engagement. Let’s go find out what pages this user looked at in this specific session. Actually, let’s take it one step further and combine this with the order in which pages were viewed as well.

Here is the query:

SELECT hits.hitNumber,hits.page.pagePath FROM [8839142.ga_sessions_20150920] where visitid=1442740881 and hits.type='PAGE' LIMIT 100

The results of this query are shown in Figure 16.7.

Screenshot shows a table listing the number of hits by nine different page paths. Number of hits listed in ascending order as 1, 16, 17, 20, 25, 27, 31, 32 and 38.

Figure 16.7 Looking at the hits generated during an individual user’s session.

We can now see the sequence of pages this user looked in his or her session:

  1. Viewed Social Analytics blog post.
  2. Navigated back to the home page (probably to learn more about our company).
  3. Went to the main blog page.
  4. Read GA account configuration blog post.
  5. Back to the main blog page.
  6. Read Benchmarking reports blog post.
  7. Back to the main blog page.
  8. Read Creative Remarketing blog post.
  9. Back to the main blog page.

You could also look at this data in aggregate path analysis to get a sense of what engaged users are looking at within your site or mobile app (since this methodology could be used on screens as well as pages). For examples of aggregated visualizations, see www.e-nor.com/gabook.

Use Case 2: Ecommerce

Let’s say you run an E-commerce store and want some more information on how your users interact with your products. We’ll start with a simple query: which products were purchased on a particular day? We’ll use a fictitious sporting goods store in our example.

SELECT hits.item.ProductName as Product, hits.item.itemQuantity as Quantity, hits.item.itemRevenue as Revenue
FROM [hockeystore:049725.ga_sessions_20150901] where hits.item .ProductName!='null' and totals.transactions˃0 order by hits.item .itemRevenue DESC LIMIT 100

The results are shown in Figure 16.8.

Screenshot shows a table listing the quantity and revenue of 17 products purchased from a sporting goods store such as helmets, skates, sticks, gloves, pucks, jersey, socks, bags, pads et cetera.

Figure 16.8 Results from a query showing products purchased from a sporting goods store on a single day.

The product called Skates earned a lot of units on this day. What if we were to answer the following business question: for users who purchased Skates, what other products did they purchase?

Here is the query to show this info:

SELECT
 hits.item.productName AS other_purchased_products,
 COUNT(hits.item.productName) AS quantity
FROM [hockeystore:049725.ga_sessions_20150901]
WHERE
 hits.item.productName IS NOT NULL
  AND hits.item.productName !='Skates'
  AND fullVisitorId IN (
 SELECT
  fullVisitorId
 FROM [hockeystore:049725.ga_sessions_20150901]
 WHERE
  hits.item.productName CONTAINS 'Skates'
  AND totals.transactions ˃= 1
 GROUP BY
  fullVisitorId
  LIMIt 100)
GROUP BY
 other_purchased_products
ORDER BY
 quantity DESC;

Ignoring the complexity of the query itself, the point is that it shows us “people who bought a product called Skates also purchased the following products,” as illustrated in Figure 16.9 and Figure 16.10. That is immensely useful data and can form the basis of a recommendation engine to cross-sell other products.

Screenshot shows a table listing the quantity of purchased products such as helmets, shoulder pads, jersey, hockey tape and skate laces as 16, 13, 5, 3 and 2 respectively.

Figure 16.9 Results of the query showing which other products were purchased by customers who purchased Skates.

Screenshot shows a webpage displaying Bauer Senior Supreme 140 Skate and its price. It also shows photos and prices of helmet, shoulder pad, hockey tape, hockey laces and jersey in the bottom half.

Figure 16.10 Queries for also-purchased products could serve as the basis for a recommendation engine.

Use Case 3: Advanced Funnel Analysis

Funnels are an oft-asked-about feature of GA. The funnel features in GA are useful but lack some key features such as the ability to segment the funnel on the fly or to retroactively apply the funnel to historical data. Typically this type of analysis can be done outside of Google Analytics.

The following is a contribution from James Standen, founder of Analytics Canvas discussing how to do such funnel analysis by leveraging Analytics Canvas.

As a note, if you have licensed Analyics you can take advantage of the Custom Funnel feature discussed in Chapter 18


inline Key Takeaways

  • Data visualization complements the reporting available in the Google Analytics user interface. While Google Analytics has a robust interface, many advanced use cases require pulling data out of GA and into a data visualization tool, particularly when we need to integrate with other data sets.
  • Sampling can severely and negatively impact your data quality. Sampling is something you should pay attention to since it impacts the quality of your data. Be aware of what causes sampling and which of your reports may be impacted by it.
  • Hit-level data in BigQuery opens up a new world of analysis. BigQuery facilitates very granular level of data analysis not available natively in Google Analytics. Learn this product—it’s the future.
  • Plan your report automation road map. True report automation nirvana can only be achieved with careful thought and attention given to the variety of factors driving reporting within your environment.

inline Actions and Exercises

  1. Is your data sampled? Check a few of your reports in GA to see which are sampled and which are not. You can check this by looking at the sampling indicator at the top of each report.
  2. Extract data outside GA. Using the methods defined in this chapter, try to get data outside of GA using the available export or API functionality (or via a tool).
  3. Visualize. Use this data to create a visual by building a chart in any visualization tool—Excel, Tableau, and so on.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset