Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

16
Advanced Reporting and Visualization with Third-Party Tools

In this chapter, we turn our focus to advanced reporting techniques, including data extraction and cleanup, as well as reporting automation. We’ll also continue our discussion of data integration from Chapter 15.

To put it bluntly, advanced reporting—and report automation specifically—can be a bit of a bird’s nest. Advanced reporting is a series of sometimes very manual steps that combine to form a report/chart/dashboard or enable deeper ad hoc analysis. The end product, which may look simple and elegant, is the outcome of a complex process.

With all the data being collected and all the different segmentation features available in GA, you would think that reporting and dashboarding data would be a problem solved long ago. Advanced reporting, however, remains one of the biggest challenges facing organizations today. Certainly, GA (among other tools) makes it easy to construct a basic report. Designing a reporting solution, however, that takes data sampling and automation into consideration is an entirely different matter. An effective report has the following characteristics:

It takes data accuracy into consideration (no sampling).
It has an elegant and simple presentation.
It is automated so it doesn’t require hours and hours of construction time.
It delivers at strategic intervals in an easy-to consume format.
Most importantly, it provides business insight.

Producing a report or dashboard that excels in all the above areas can be a daunting obstacle to overcome for many organizations. Analysts often spend countless hours manually copying data from various analytics tools, pasting it into Excel or PowerPoint, cleaning it up, and then building an unusable and unattractive report, which is then emailed to stakeholders, usually without providing additional context about the data.

The reality is that it takes a significant amount of time and effort to work with the data contained in analytics tools, extract it into a format we can work with, make it report-friendly so that the report viewers will understand it, and then share it with our users through automation. Remember the time you put together a report of your Top 10 Landing Pages and the number 1 viewed page was “/”? The analysts, of course, know that “/” represents your home page, but this isn’t something that most people, particularly executives, will know or even should be expected to know. If you have not already rewritten “/” in your GA view settings as described in Chapter 9, rewriting it now to a more report-friendly name that management will easily be able to interpret has a profound impact. It’s our responsibility as architects of the reporting solution to construct the report in a way that is easy to consume for its intended audience. Sounds simple enough, right? Well, it kind of is, but combine this with the many other similar data inconsistencies that exist, and it’s not so straightforward to do this in an automated way.

At a recent conference on sports analytics, one of the speakers was the Director of Analytics for an NBA (the U.S. National Basketball Association) team. In addition to offering many valuable analytics insights, he said, “What do you think I spend 90% of my time doing? I’m basically cleaning up data in spreadsheets.” That really cemented for me just how important it is for us to leverage tools that can automate as much of the data extraction, cleanup, and visualization process as possible. Analysts shouldn’t be spending their time copying and pasting data, renaming data, and building graphs and charts. Analysts should spend their time mining for insights and discovering trends and patterns that can impact the business.

In this chapter, we cover how to get data out of GA, the limitations of each method, and the factors to consider along the way to help guide us through the maze. At the end of the chapter we discuss three advanced use cases for extracting and visualizing GA data.

Framing the Issue: How to Get Data Out of GA

There are several ways of getting data out of GA. We’ll focus on the methods that are most effective in supporting our goal of automated reporting.

Core Reporting API

The GA Core Reporting API (https://developers.google.com/analytics/devguides/reporting/core/v3/) provides a basic querying system to request metrics and dimensions in the form of tabular data from GA. It can access the majority of report data contained within GA. The returned data will look and feel similar to what you would see within the interface, and enable you to build dashboards, and automate reporting tasks outside of the GA interface. For example, you can leverage the API to extract and integrate GA data into a Web page, or Excel, or any other application. Note that the Core Reporting API is the only automated extraction method available to GA Standard users.

Following is an example of a query to the Core Reporting API. In this example, we are querying for the top 10 Channels during the last 30 days.

https://www.googleapis.com/analytics/v3/data/
ga?ids=ga%3A73156703&start-date=30daysAgo&end-date=yesterday&metrics=ga%3Asessions%2Cga%3AbounceRate&dimensions=ga%3AchannelGrouping&sort=-ga%3Asessions&max-results=10

The result of this query is shown in Figure 16.1.

Table lists the number of sessions and bounce rate for different default channel groupings such as organic search, direct, social, referral, display, email, branded and generic paid search and others. — **Figure 16.1** Output of a typical query to the Core Reporting API.

Unsampled Request API

The Unsampled Request API differs from the Core Reporting API in that it allows you to access unsampled data. While the Core Reporting API allows you to dynamically combine metrics and dimensions, the Unsampled Request API instead allows you to access predefined Unsampled reports in comma-separated values (CSV) format.

Third-Party Tools

Analytics Canvas

Analytics Canvas is a tool that can be used to automate the extract, transform, load (ETL) process. It can connect directly to GA via the Core Reporting API as well as the Unsampled Request API, and to GA data that has been stored in BigQuery. It can also connect to many types of databases and to Excel files. Data can then be modified as needed and automatically uploaded to a database of your choice.

The software requires a Windows-based computer that would need to have access to all data sources. You would need to set up Analytics Canvas to connect to, manipulate, and then automatically push data to the desired output location.

Other tools that offer more advanced but conceptually similar functionality within Analytics Canvas include Informatica, SQL Server Integration Services (SSIS), IBM InfoSphere, SAP Data Services, and many others. Analytics Canvas is primarily designed to work with analytics and marketing data sources and is well suited for GA purposes.

Tools such as SSIS are good at extraction and transformation from internal databases, but either don’t offer connections to GA/BigQuery or offer limited functionality (with respect to GA).

Analytics Canvas can connect to the GA data using three methods:

Core reporting API
Unsampled Request API
BigQuery

Not only will Analytics Canvas offer direct connectivity to GA, but since it’s designed primarily to be used for analytics data, it offers the ability to extract unsampled data, something that other ETL tools typically don’t.

For GA Standard users who are restricted to using the Core Reporting API only, Analytics Canvas leverages a creative method of addressing data sampling called Query Partitioning. This effectively segments a query into smaller chunks to reduce the number of sessions within each chunk. For example, if your reporting period were one year, Query Partitioning would split up the query into 12 smaller queries. The process is transparent to the user other than that it takes longer to execute the query. It’s a clever way of mitigating the effects of sampling, but as the word mitigate suggests, it only reduces the effects of sampling; it doesn’t eliminate. Depending on the volume of data being reported on, Query Partitioning may have a major or perhaps a marginal effect on your data.

A sample of a “Canvas” is shown in Figure 16.2.

Block diagram shows processes such as main-roll up, sampling, choose type, product, make lower, summarize error and USB product grouping for mobile. — **Figure 16.2** The visual blocks within the canvas make it extremely easy to follow the extraction and transformations occurring within your data set.

BigQuery

BigQuery is a querying tool that leverages the power and speed of Google’s cloud infrastructure to store and query billions of rows of data in seconds. Initially developed as an internal tool to serve Google’s internal technology stack, it works amazingly fast and enables complex data processing and segmentation.

BigQuery is a different product entirely from GA—the two aren’t directly related. However, there is an integration available between Analytics 360 and BigQuery. As part of the integration, GAP exports data to BigQuery on a nightly basis. Not only is the majority of existing GA data available in BigQuery, but BigQuery’s robust structure enables us to see deeper, more granular hit-level data. The additional layer of data enables us to understand and analyze user behavior with respect to sequence and the order in which activities were performed in a session.

BigQuery is one of several storage options within the Google Cloud Platform, as shown in Figure 16.3.

Diagram shows four storage options such as Google cloud storage, cloud SQL, BigQuery and Cloud datastore alongwith their functions . — **Figure 16.3** The Google Cloud Platform

Is BigQuery a Relational Database?

Although BigQuery is designed to run SQL-like queries, it cannot be considered a relational database management system. Relational Database Management Systems (RDBMSs) such as Oracle, SQL Server, DB2, or MySQL are designed to efficiently perform all CRUD operations (create, read, update, delete), BigQuery is designed primarily for storage and the read operation—specifically, to run faster queries on extremely large data sets while avoiding the sampling that an RDMBS may apply.

We explore some advanced use cases for BigQuery later in this chapter.

For the datasets, tables, rows, and columns available for GA data exported to BigQuery, see “BigQuery Export Schema” in the GA help docs:

https://support.google.com/analytics/answer/3437719?hl=en

Tableau

Tableau is the recommended tool for visualizing the reports. Tableau is one of the leading data visualization tools on the market today and is extremely powerful at dashboard creation, ad hoc analysis, and the building of a self-service reporting solution.

GUEST SPOT Google Analytics Breakthrough: From Zero to Business Impact

Jeff Feng

Jeff Feng is a Product Manager at Tableau Software.

Tableau is an interactive data visualization tool that helps people understand their data by providing an easy-to-use, drag-and-drop interface. We believe that data analysis should be about asking great questions, so we strive to enable the people who know the data best to quickly and easily answer their own questions of the data.

Tableau provides native connectivity to GA using their Core Reporting API, enabling users to create in-memory extracts of their data. As shown in Figure 16.4, Tableau provides many unique benefits for the analysis of GA data including the ability to:

Figure 16.4 Tableau and integrate GA and other data sources with flexible, stylized, and interactive formatting options.

Mash up GA data with offline data, customer records, demographic data, and social media data without needing to move data from other data sources.
Use drag-and-drop operations to build highly dynamic and interactive custom dashboards.
Discover deeper insights about your data with point-and-click advanced analytics or new calculations.

In particular, the ability to blend GA data with other data sources is especially powerful.

GA is extremely powerful unto itself, and Tableau’s objective is to provide even richer insights through integration with other enterprise data sources and to present the answers to the full spectrum of your data questions quickly and dynamically, with great visual appeal and the utmost clarity.

Tableau has three components:

Tableau Reader: This is freely available for download and install. Its only function is to enable the user to view Tableau report files, similar to Adobe Reader for PDF files. If a file-sharing report distribution system is used, end users must install Tableau Reader to open the native-format Tableau files. Tableau Reader is available in both Windows and Mac versions.
Tableau Desktop: This is the report-building environment of the Tableau suite. A select group of users who are well trained in Tableau will typically have access to this software and will be able to view the raw data from each data source. Within this environment, report builders can either perform ad hoc analysis, or build reports/dashboards/visualizations to be shared as files (opened by Tableau Reader above) or published to the Web-based Tableau Server (mentioned below). Tableau Desktop is available in both Windows and Mac versions.
Tableau Server: The Web-based Tableau Server acts as a publishing environment for Tableau reports. This software would be installed on a central Web-server within your network. Report builders would build reports and publish to this server, to be consumed by end users. End users would log in (in their browser since it is Web-based, eliminating the need for the Tableau Reader) and view reports/dashboards/visualizations as allowed by a role-based permissions model. (Tableau Server is currently available for Windows servers only.)

What Factors Dictate Which Tool to Use?

A considerable factor in devising a reporting solution is resolving the issue of the many disparate data sources that are not able to expose data in a way that allows the data to be pulled automatically. To facilitate an end-to-end automation process, we recommend the acquisition of a middle-layer tool called Analytics Canvas (aka Canvas), described earlier.

With all these tools available to use, it of course gets hard to determine which tool to use. There are several factors that impact the solution we should use, but one of the primary factors is sampling, discussed in Chapter 11. Since sampling in GA directly affects the quality and integrity of your data, Table 16.1 may help in determining which tools to use, or at least in eliminating options.

Table 16.1 A Summary of Available Data Extraction Methods/Tools, and Ideal Uses

Extraction Method	Sampling	Notes
Core Reporting API	Subject to same sampling thresholds as GA Standard interface (500,000 sessions within reporting period). 10,000 rows maximum returned per query.	Tools such as Canvas can help mitigate (not eliminate) effects of sampling by using the Query Partitioning Feature.
Unsampled Request API (Analytics 360 only)	Provides access to preconfigured unsampled reports.	Great way of exporting Unsampled data but limited access makes this a bit cumbersome. Unsampled reports must either be manually downloaded from within the tool (or emailed), or accessed via Google Drive account as CSV files.
BigQuery (Analytics 360 only)	Provides access to unsampled hit-level data.	Can be used in two models: 1) as a data hub where you can upload additional data sources and join with GA data; 2) as a vehicle for accessing unsampled hit-level GA data.
Analytics Canvas	Can connect to Core Reporting API, and Unsampled Request API for fully Unsampled data, as well as BigQuery.	Works in tandem with visualization tool. By itself, it only provides a facility to extract and transform data.
Tableau	Contains a connector to access GA data but prone to significant sampling issues.	Much better solution is to feed data to Tableau via integration with Analytics Canvas, or to use the automated export from Analytics 360 to BigQuery and connect to BigQuery from Tableau.

ETLV–The Full Reporting Automation Cycle

ETL stands for extract, transform, load. It is a BI-oriented process to load data from the source system to the target system to enable business reporting. We added a V at the end to make this process a little more current and complete. The V stands for visualize, of course!

The overall solution will function as follows:

Extract the data.
Transform the data.
Load the data into a reporting platform.
Visualize.

This process is illustrated broadly in Figure 16.5.

Diagram shows the data flow though three blocks such as original data stores, extract and transform and load and visualize. It also shows various components of the blocks. — **Figure 16.5** Data flow through ETLV.

There are several factors that need to be considered in the ETLV process to extract/obtain data from heterogeneous data sources, modify (transform) it, and then load it into a data-reporting tool for visualization in an automated way. We’ve broken down some of the factors in Table 16.2.

Table 16.2 Factors at Each Stage of an Automated ETLV Reporting Solution

Factor	Considerations
Phase: Extract
Data Sources	Typical business may need to pull data from platforms such as Google Analytics, WebTrends, Adobe Site Catalyst, Marketing Automation tools like Marketo or Eloqua, CRM tools like SalesForce, Display Advertising data such as AdWords, DoubleClick, AdMob, E-Commerce Data, App data (from Apple Store or Google Play), or any number of internal databases.
Format	What format is the data stored in, and how is it exposed will dictate the method we use to extract the data.
Frequency	How often is the data made available or refreshed?
Time Frame	Is the data made available in incremental chunks (daily, weekly, or monthly extracts) or as a full extract?
Phase: Transform
Cleanup	How much sanitization, data filtration, and renaming (to make data understandable by its intended audience and report-friendly) is needed?
Structure	This step accounts for any calculation (e.g., calculating bounce rates, de-duplicating of data, summing/aggregation) or other calculations based on business logic.
Data Integration	Are there any common keys available to join data sources together, or facilitate data widening via a lookup or mapping file?
Phase: Load
Frequency	How “fresh” does the data need to be? For monthly reports, this typically isn’t a concern, but for daily reports it can pose a big challenge.
Import Type	Will data be loaded incrementally? Or will the entire data set be overwritten with each cycle?
Format	What is the output of the load process? In other words, where is the data being sent for final reporting? Excel? Tableau? Data Studio 360? Some other reporting or visualization tool?
Phase: Visualization
Governance	Who will be viewing the data, and which reports should they have access to?
Distribution	Will the reports be shared via email? On a network shared drive? Or will users login to a web-based system and view reports there?
Software	Which software have you already invested in to build reports with? Selection of software will depend highly on the reporting requirements but also on the willingness of end users to learn a new tool, and of reporting champions to enforce a particular tool/platform.

Before deciding on a solution or any one tool, it’s recommended that you take a step back and evaluate your organization’s reporting needs. By going through the above table and getting a better understanding of your data sources and how much data cleanup is needed, your organization will be far better equipped to make a solid business decision and build a reporting solution that will deliver insights with the speed, agility, and depth needed.

Advanced Use Cases for BigQuery/Tableau

Let’s take a step away from the architecture of reporting and review some examples of reports that we can build using two tools available to us in the overall ETLV stack: BigQuery and Tableau.

Use Case 1: Path Analysis

As we mentioned earlier, the data available in GA is very aggregate in nature. You can see overall traffic by Campaign, Top Landing pages by Campaign, and so on. You can also drill into Users who performed certain actions on your site or mobile app. What you can’t easily do within the GA interface is determine the order in which users, in aggregate, performed certain actions (even though we can refer to the User Explorer report for the series of actions completed by individual, anonymous users). For example, do the majority of users view a video first or download a PDF document prior to converting on your lead submission goal? The flow reports currently available in GA, while useful, don’t always answer these types of questions about aggregate, hit-level flow through our websites and apps.

Enter BigQuery.

As part of the integration with Analytics 360, the data exposed to BigQuery includes a layer of data that isn’t available within the GA interface. This hit-level data includes time/sequence information so we can do exactly this type of flow analysis. Within BigQuery we can drill into this data for a specific user (based on user ID) or even a specific user session (similar to the User Explorer report in GA).

In order to do this, let’s first find a user who was fairly active. To keep things simple, we’ll just focus on one day of data.

Here is a simple query to find us just the right user session to dive deeper into:

SELECT CONCAT(fullVisitorId, STRING(visitId)) AS userSession,totals.hits FROM [8839142.ga_sessions_20150920] order by totals.hits desc LIMIT 100

The results are shown in Figure 16.6.

Screenshot shows a table listing total hits by ten different visit IDs. Number of hits listed in descending order as 80, 40, 17, 16, 13, 12, 12, 11, 10 and 9. — **Figure 16.6** Results of a simple query to extract total hits by visit ID.

With 80 hits, the session in row 1 shows a high level of engagement. Let’s go find out what pages this user looked at in this specific session. Actually, let’s take it one step further and combine this with the order in which pages were viewed as well.

Here is the query:

SELECT hits.hitNumber,hits.page.pagePath FROM [8839142.ga_sessions_20150920] where visitid=1442740881 and hits.type='PAGE' LIMIT 100

The results of this query are shown in Figure 16.7.

Screenshot shows a table listing the number of hits by nine different page paths. Number of hits listed in ascending order as 1, 16, 17, 20, 25, 27, 31, 32 and 38. — **Figure 16.7** Looking at the hits generated during an individual user’s session.

We can now see the sequence of pages this user looked in his or her session:

Viewed Social Analytics blog post.
Navigated back to the home page (probably to learn more about our company).
Went to the main blog page.
Read GA account configuration blog post.
Back to the main blog page.
Read Benchmarking reports blog post.
Back to the main blog page.
Read Creative Remarketing blog post.
Back to the main blog page.

You could also look at this data in aggregate path analysis to get a sense of what engaged users are looking at within your site or mobile app (since this methodology could be used on screens as well as pages). For examples of aggregated visualizations, see www.e-nor.com/gabook.

Use Case 2: Ecommerce

Let’s say you run an E-commerce store and want some more information on how your users interact with your products. We’ll start with a simple query: which products were purchased on a particular day? We’ll use a fictitious sporting goods store in our example.

SELECT hits.item.ProductName as Product, hits.item.itemQuantity as Quantity, hits.item.itemRevenue as Revenue
FROM [hockeystore:049725.ga_sessions_20150901] where hits.item .ProductName!='null' and totals.transactions˃0 order by hits.item .itemRevenue DESC LIMIT 100

The results are shown in Figure 16.8.

Screenshot shows a table listing the quantity and revenue of 17 products purchased from a sporting goods store such as helmets, skates, sticks, gloves, pucks, jersey, socks, bags, pads et cetera. — **Figure 16.8** Results from a query showing products purchased from a sporting goods store on a single day.

The product called Skates earned a lot of units on this day. What if we were to answer the following business question: for users who purchased Skates, what other products did they purchase?

Here is the query to show this info:

SELECT
 hits.item.productName AS other_purchased_products,
 COUNT(hits.item.productName) AS quantity
FROM [hockeystore:049725.ga_sessions_20150901]
WHERE
 hits.item.productName IS NOT NULL
  AND hits.item.productName !='Skates'
  AND fullVisitorId IN (
 SELECT
  fullVisitorId
 FROM [hockeystore:049725.ga_sessions_20150901]
 WHERE
  hits.item.productName CONTAINS 'Skates'
  AND totals.transactions ˃= 1
 GROUP BY
  fullVisitorId
  LIMIt 100)
GROUP BY
 other_purchased_products
ORDER BY
 quantity DESC;

Ignoring the complexity of the query itself, the point is that it shows us “people who bought a product called Skates also purchased the following products,” as illustrated in Figure 16.9 and Figure 16.10. That is immensely useful data and can form the basis of a recommendation engine to cross-sell other products.

Screenshot shows a table listing the quantity of purchased products such as helmets, shoulder pads, jersey, hockey tape and skate laces as 16, 13, 5, 3 and 2 respectively. — **Figure 16.9** Results of the query showing which other products were purchased by customers who purchased Skates.

Screenshot shows a webpage displaying Bauer Senior Supreme 140 Skate and its price. It also shows photos and prices of helmet, shoulder pad, hockey tape, hockey laces and jersey in the bottom half. — **Figure 16.10** Queries for also-purchased products could serve as the basis for a recommendation engine.

Use Case 3: Advanced Funnel Analysis

Funnels are an oft-asked-about feature of GA. The funnel features in GA are useful but lack some key features such as the ability to segment the funnel on the fly or to retroactively apply the funnel to historical data. Typically this type of analysis can be done outside of Google Analytics.

The following is a contribution from James Standen, founder of Analytics Canvas discussing how to do such funnel analysis by leveraging Analytics Canvas.

GUEST SPOT Advanced Funnel Analysis—The Next Level

James Standen

James Standen is founder and CEO of nModal Solutions Inc.

It is possible to move beyond Google Analytics’ built-in funnel capabilities and have a much more sophisticated view of the funnel by using GA integration capabilities.

There are a number of major limitations for funnel reports in GA, some of which are:

Difficult to use segments.
Difficult to see evolution over time.
(GA Standard)—No way to do funnel analysis on historical data.

It is possible, however, by using the powerful integration available in GA, to pull the data out of GA, and perform much more advanced funnel analysis. We will be able to create not just a single view of a funnel for an entire period, but see the exit rates by step over time as they evolve, and segment the funnel—so, for example, we can see how funnel performance changes by time of day, traffic source, and so on.

We will look at two advanced techniques:

Core Reporting API. Available for both GA Standard and Analytics 360 accounts, by doing a number of queries and combining the results together, it is possible to build more advanced funnel analysis.
BigQuery. Available for Analytics 360 only, with some fancy SQL footwork almost anything is possible with BigQuery—we’ll look at some examples.

Both of these techniques are available in Analytics Canvas, a third-party tool created by Google partner nModal Solutions. Analytics Canvas provides a visual environment that lets users extract data from GA in sophisticated ways, and then clean, merge, and combine this data with other data sources and then deliver it to databases, to reports, or to visualization tools such as Tableau.

Using the Core Reporting API

By making a number of queries to the core reporting API, it is possible to get the advanced funnel information that is available within the GA interface, only from the Custom Funnel in Analytics 360, and in fact to get more detailed and segmented data of any given funnel.

Each query is created by using a segment that gets the number of sessions that have certain characteristics. When put together, a complete picture of the funnel activity is available.

First, a series of funnel steps must be defined. In Analytics Canvas this is done using the funnel query user interface (Figure 16.11). In this case we are only using pagePath, but a number of other options also exist. Notice also that there is a Segment tab. With this, we can overlay a segment onto the funnel—something not possible in GA Standard.

Figure 16.11 Funnel configuration in Analytics Canvas.

Once these steps are defined, then behind the scenes, for each step, three core reporting API queries are done. They are as follows:

The number of sessions that went to the step and to no previous steps. This gives us the number of sessions that entered the funnel directly at this step.
The number of sessions that went to the step and to a previous step. This gives us the number of sessions that made it to this step from some previous step.
The number of sessions that went to the step but did not go to any later step. This is a very important one—this tells us how many sessions exited at this step.

The result of these three queries gives one row in the result data set—when we run them for each step in the funnel we get the following: A complete step by step funnel analysis (Figure 16.12). The example data set therefore would require 3 × 4 = 12 core reporting API calls.

Figure 16.12 Funnel analysis.

Where it becomes really powerful is if we add in the date—now, we can look at the change in the funnel over time. By adding segments, we can see how different segments interact with our funnel, and identify issues based on traffic source, or technical issues such as browser that might be increasing funnel exit rate at specific steps.

For example, we could even look at funnel performance by segment, by hour of the day across different days of the week—all these types of queries are supported by the Core Reporting API using this method.

Analytics 360—BigQuery, the Ultimate Funnel Analysis

As you would expect, the ultimate funnel analysis is only available with Analytics 360. This is made possible by the BigQuery integration with Analytics 360, discussed earlier in the chapter. When an Analytics 360 account has its BigQuery integration set up, all the detailed hit-level data is transferred every day into tables in BigQuery. As a result, there are no real limits to the potential analysis—the only thing standing between you and your data is writing the right BigQuery SQL query.

The types of queries might be to identify user trips through the funnel that involved backtracking—where users were confused or missed something, and by analyzing these interactions make the funnel clearer.

Analytics Canvas generates the BigQuery SQL required to implement an enhanced version of the funnel report we just looked at and returns the data directly into Canvas (Figure 16.13).

Figure 16.13 Analytics Canvas can generate BigQuery SQL.

Again, because we are using BigQuery, we can add as many columns as we like—no limits on dimensions or metrics, and within the SQL, the WHERE clause gives us lots of options for segmentation. The result is that it is possible to generate data sets that represent funnel performance, set up in a dimensional model that lets you slice and dice as needed. In fact, BigQuery can handle so much data that you can create segment tables (lists of millions of sessions or users that satisfy a given criteria), and then join this table to the funnel table to get a completely segmented set of funnels to analyze.

The key to the SQL structure is the use of FLATTEN to get a table of the hits involved, then do a series of tests, and sum up the total sessions for each test. In the above example, we are looking at the funnel by medium, and so have aggregated away the visitID, but if you had left that in and written the result to a BigQuery table you would have a generated a complete funnel analysis at the individual session level. While a detailed analysis of the SQL involved is beyond the scope of this overview, the full SQL query that is generated by Analytics Canvas is available in the tool, so you can see the structure, and modify it as needed.

The final, ultimate goal is to create such a data set and visualize it in a tool such as Tableau—where it is possible to explore a multifaceted data set and its funnel performance at multiple levels, drilling down even to the individual session/user level.

Whether you are using GA Standard or Analytics 360, these techniques, and tools such as Analytics Canvas and Tableau, can provide significantly enhanced access to your funnel, and provide new opportunities for insight to let you optimize.

As a note, if you have licensed Analyics you can take advantage of the Custom Funnel feature discussed in Chapter 18

GUEST SPOT Accessing GA Data with R

Eric Goldsmith

Eric Goldsmith is the Data Scientist for TED Conferences.

The GA graphical user interface (GUI) works well for many use cases. But if you find yourself running up against functionality limits, such as the need to:

Use more than two dimensions (Custom reports allow up to five and the API allows up to seven in a single query).
Generate complex, calculated metrics (Simple calculated metric functionality in the GUI was being beta-tested when this was written).
Combine GA data with external data sources.
Report across multiple GA Properties.
Create more sophisticated visualizations.
Mitigate the impact of data sampling.

Then accessing GA data through a specialized tool or programmatically is the logical next step.

Google provides two programmatic access methods:

The Core Reporting API: https://developers.google.com/analytics/devguides/reporting/core/v3/)
The BigQuery API: https://cloud.google.com/bigquery/docs/reference/v2/

Both are general purpose, and nearly any programming language can be used. But one language that is exceptionally capable at data manipulation is R (https://www.r-project.org/about.html).

The following sections will focus on the mechanics of using R for accessing and visualizing GA data. For information on learning to use R, there’s plenty of help online:

http://tryr.codeschool.com/
https://www.datacamp.com/
http://dss.princeton.edu/training/RStudio101.pdf
https://support.rstudio.com/hc/en-us/sections/200107586-Using-RStudio
You can download the unabridged version of “Accessing GA Data with R,” including the use of R with the GA Core Reporting API, at www.e-nor.com/gabook. Using BigQuery

If you are a GAP customer, you can export your GA data to Google’s BigQuery (https://support.google.com/analytics/answer/3437618) and gain access to the raw, unaggregated data that is not subject to sampling.

Using BigQuery is quite different from using the Core Reporting API: there are pros and cons to each, but that discussion is outside the scope of this introduction.

Installing Needed Packages

The bigrquery package (https://cran.r-project.org/web/packages/bigrquery/index.html) is available from CRAN.

install.packages("bigrquery")

Querying GA Data

BigQuery uses an SQL-like language (https://cloud.google.com/bigquery/query-reference), often referred to as BQL. To obtain the number of homepage pageviews from different device types in the United States, trended over the last 30 days, as shown in Figure 16.14, the code looks like this:

Figure 16.14 Pageviews by device type.

library(bigrquery)
 
project ˂-"xxxxxxxxx"
dataset ˂- yyyyy
 
query ˂-
"SELECT DATE(date) AS date, device.deviceCategory AS deviceCategory, COUNT(hits.type) AS pageviews
 FROM (TABLE_DATE_RANGE([%s.ga_sessions_], TIMESTAMP('%s'), TIMESTAMP('%s')))
 WHERE hits.type = 'PAGE'
  AND hits.page.pagePath = '/'
  AND geoNetwork.country = 'United States'
 GROUP BY date, deviceCategory
 ORDER BY date, deviceCategory;"
 
bql ˂- sprintf(query, dataset, startDate, endDate)
data.bq ˂- query_exec(bql, project = project)

The project and data set values above are obtained during the BigQuery Export (https://support.google.com/analytics/answer/3416092) setup process.

Visualization

A box plot is helpful to visualize data distribution:

# Boxplot
ggplot(data.bq, aes(x = deviceCategory, y = pageviews, group = deviceCategory, fill = deviceCategory)) +
 geom_boxplot() + guides(fill = FALSE)

Next Steps

The material above barely scratches the surface of what’s possible with GA data and R.

R can be used for interactive exploration of data, or scripted program execution. R scripts can be used for anything from data manipulation pipelines, to automated report generation via tools like knitr (http://yihui.name/knitr/), and even for building Web applications via tools like Shiny (http://shiny.rstudio.com/).

With these tools, connecting GA data with other data sets (e.g., content or customer data from external MySQL databases) becomes easy, allowing exploration, reporting, and visualization of bigger-picture relationships that might otherwise remain hidden.

GUEST SPOT ShufflePoint

Mike Anderson

Mike Anderson is Data Integration and Visualization Specialist at E-Nor.

ShufflePoint has long been a great tool for accessing GA data. ShufflePoint uses a powerful Analytics Query Language (AQL) to access the GA Core Reporting API, allowing for data extraction without the need to use the GA interface. This provides numerous possibilities for automating your data extraction.

Why ShufflePoint?

For companies who are constructing Excel reports with their GA data, there may be no better tool than ShufflePoint. It’s easy to use, easy to learn, and very powerful. Raw data can be easily extracted directly to Excel and then visualized for quick and cost-effective reporting automation for GA data.

It doesn’t have the robustness and customization of Analytics Canvas and it certainly can’t be used as a visualization tool like Tableau or Data Studio 360, but if Excel is your primary reporting platform, ShufflePoint is a solid choice for extraction from GA.

Using ShufflePoint

ShufflePoint offers two methods to access your GA data: a nice graphical, browser-based Web interface the ability to connect and run queries directly from Microsoft Excel

Web Interface

The primary method of using ShufflePoint is via their Web interface (see Figure 16.15), which provides an intuitive drag-and-drop method for creating queries. You can simply drag and drop your dimensions and metrics to the column and rows and the AQL query will be written for you. At the same time, it also provides direct editing of the query using their AQL for the savvier user.

Figure 16.15 The Shufflepoint Web based user interface allows easy drag and drop query building capability.

Example AQL Query:


SELECT
  DIMENSIONS ga:Month ON COLUMNS
  METRICS ga:sessions, ga:pageviews, ga:bouncerate, ga:pageviewsPerSession, ga:avgSessionDuration ON ROWS
FROM 45076979
WHERE
  TIMEFRAME lastMonth

Within the Web interface, you can run the query and return results in real time, which also allows for copying and pasting the extracted data from the Result window. This can be an efficient method for those who only need to run a few queries and then copy paste the data into a spreadsheet or an email.

Excel Data Connection

For larger, more complex reports that involve dozens of queries, ShufflePoint truly provides easy data extraction through a native data connection in Microsoft Excel®. This connection requires no additional software, and allows for executing and refreshing your queries directly within your spreadsheets.

This can be extremely time-saving for refreshing numerous queries and instantly formatting your data, once the queries have run. This also means that your charts and graphs will dynamically update when queries are refreshed. Entire reports can be refreshed and automatically formatted to suit your needs.

One of our favorite reasons for using this method of extraction is that Excel and PowerPoint already have object linking built in. A live object link from Excel can be embedded into your PowerPoint. This means that once you have finished running your ShufflePoint queries, the next time you open your PowerPoint deck, all of your GA data is instantly updated into your Power Point report. No more copy and paste.

Features of ShufflePoint

Reducing Sampling

One of the most important features of ShufflePoint is the ability to help reduce, or eliminate altogether the amount of sampling your queries may return from the GA Core Reporting API. ShufflePoint offers the ability to “partition” or loop through your query in smaller chunks by week, month, or year, resulting in the query returning the entire time frame’s data set but broken down by the isolated date method chosen. This can be extremely helpful when querying large timeframes to help mitigate sampling.

Dynamic Filtering

Ever needed to create a filter on the fly? Directly within your ShufflePoint AQL query, you can define your own custom filter without needing to setup a Segment within the GA interface first. This also allows for easily editing the filter on a query-by-query basis, thus not affecting the filter of other queries, or requiring you to set up numerous segments.

Time Frame Comparisons

When building GA reports, time frame comparisons are extremely useful to help easily determine the percentage of change from one week to the next, one month to the next, etc. When defining the time frame of your query, ShufflePoint offers the ability to compare time frames and instantly return a percentage of change between the time frames. This can be very helpful when using the Excel integration because you can use conditional formatting on your time comparison column to easily change the percentage of change from red to green adding nice visual context to the time frame comparison without much work.

Key Takeaways

Data visualization complements the reporting available in the Google Analytics user interface. While Google Analytics has a robust interface, many advanced use cases require pulling data out of GA and into a data visualization tool, particularly when we need to integrate with other data sets.
Sampling can severely and negatively impact your data quality. Sampling is something you should pay attention to since it impacts the quality of your data. Be aware of what causes sampling and which of your reports may be impacted by it.
Hit-level data in BigQuery opens up a new world of analysis. BigQuery facilitates very granular level of data analysis not available natively in Google Analytics. Learn this product—it’s the future.
Plan your report automation road map. True report automation nirvana can only be achieved with careful thought and attention given to the variety of factors driving reporting within your environment.

Actions and Exercises

Is your data sampled? Check a few of your reports in GA to see which are sampled and which are not. You can check this by looking at the sampling indicator at the top of each report.
Extract data outside GA. Using the methods defined in this chapter, try to get data outside of GA using the available export or API functionality (or via a tool).
Visualize. Use this data to create a visual by building a chart in any visualization tool—Excel, Tableau, and so on.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 16 Advanced Reporting and Visualization with Third-Party Tools

Create new playlist

Sign In

Sign Up