Thus far in the book, the discussions about Google Analytics (GA) data capture have centered around the GA JavaScript tracking code, analytics.js. Whether placed on the page directly or deployed through Google Tag Manager as we have demonstrated and recommended, tracking code execution tends to be our primary mental paradigm for GA data capture. In Chapter 14, we also worked with the Android and iOS SDKs to record mobile app data into GA.
In this chapter, we look at the two options beyond analytics.js and the SDKs for getting data into GA: data import (through the GA user interface or through the API), and the Measurement Protocol (MP), which allows any programmed, networked environment to push hits into GA in the form of basic HTTP requests.
We’ll use data import in most cases to add dimensions to analytics.js and mobile SDK hits; using the Measurement Protocol, we’ll record new hits.
The data import feature in GA may seem somewhat complicated at first, but it’s actually quite straightforward and very flexible. In each case, you’re generating a .csv (comma-separated values) data file and importing built-in or custom dimensions against a common key in GA.
Several data import scenarios are outlined next.
In Chapter 16, “Integrating Google Analytics with CRM Data,” we recorded GA campaign information directly in our CRM (Salesforce) so we could correlate our medium, source, and campaign values with the lead status that our sales team assigned in the CRM. We also recorded a common visitor ID in both GA and our CRM to enable additional joins between the two data sets within the CRM, within a separate environment, or within GA.
We can take advantage of the GA data import feature to accomplish the join within GA. (Since the imported data would need to apply retroactively, this scenario would require query-time import available in Analytics 360. The content, product, and geo imports that appear later in the chapter would be suitable as processing-time or query-time import and would therefore also be applicable to GA Standard.) In this example, we’ll take the following steps to import lead status from the CRM for further correlation with GA data. Note that you need Edit access at the property level to perform the following procedure:
Create a Lead Status custom dimension in GA.
The data that we’re pulling in from Salesforce is the qualification status that our sales team assigned to each lead. Since there is not currently a slot for this data in GA, we’ll need to create a custom dimension as in Figure 17.1 to populate as the objective of the data import.
Let’s say that we had previously created five other custom dimensions in the GA property and that this is the sixth. When we’re importing the CRM data into GA, we’ll add ga:dimension6 as a heading to the lead status column to match it to the lead status custom dimension.
Create the data set schema in GA.
Before we can import our CRM data into GA, we need to create a data set to receive the CRM data and define a schema to map the data import. In most cases, you’ll designate a single key field that will serve as the join between the data sets and one or more target fields to populate in GA from the imported data.
In the next step, designate the common field/dimension to be used as the join in the import and, under Imported Data, one or more target dimensions to copy the new data into, as shown in Figures 17.2 and 17.3.
Click Get Schema to confirm the heading names that we’ll need to add to our exported CRM data in step 3. In the lead status example, you’ll configure the schema as either ga:dimension4, ga:dimension6, or ga:userId, ga:dimension6, depending on where you stored your visitor ID from the CRM (and assuming that you created the lead status custom field in the sixth custom dimension slot as in the example). If you previously stored the CRM visitor ID as both the User ID (for cross-device) and your Visitor ID custom dimension for CRM integration, you can use either as the key in your schema.
Export CRM data.
In Salesforce, as an example, we can create a report that contains one of two field sets, following the schema in either Figure 17.2 or 17.3.
Lead ID/Customer ID and lead status. If, instead of passing a client-generated visitor ID to your CRM when a lead or purchase form was initially submitted, you retrieved the lead or customer ID from your new CRM record and recorded it in GA as the visitor ID custom dimension or the cross-device user ID, export this lead/customer ID and lead status.
When you export your CRM report as comma-separated values (CSV), you’ll generate a file similar to Listing 17.1. For our example, we’ll call this file sf-export-lead-status.csv.
Note that you must add the heading row yourself. This schema example corresponds to Figure 17.2.
Once the import is completed, you’ll see a confirmation, as in Figure 17.5.
Note that you can also import data through the GA Management API instead of through the GA user interface. The management API would be a better option in scenarios where automation is desired. If data is being uploaded to GA on an ad hoc or infrequent basis, then the interface method works perfectly. If the upload is frequent or if a hands-off approach is desired, then the capabilities facilitated by the management API become much more attractive.
In many cases, such as the lead status example outlined in the previous section, the new data is imported into one or more custom dimensions. While custom dimensions do not appear by default in any of the built-in GA reports, you can apply custom dimensions as secondary dimensions in the built-in reports and use custom dimension to define a custom segment or custom report as discussed in Chapter 10, “Segments,” and Chapter 11, “Dashboards, Custom Reports, and Intelligence Alerts.”
We could apply our imported Lead Status data as a custom dimension in the Campaigns report as displayed in Figure 17.6. This report displays some of the same data as Table 15.4, but here the merge between CRM and GA data has occurred within GA and not the CRM or a separate environment.
It was in Chapter 12, “Implementation Customizations,” that we first discussed the classic examples of content dimensions that GA can’t record without help from you: author and category on blog or article pages. GA has no way to capture this data by default, so we populated the data layer with author and category variables from your content management system (CMS) since they did not already appear on the page, and we recorded author and category as custom dimensions with the pageview hit.
If, for any reason, your developers were not in a position to write that back-end data to the data layer so you could record it as a custom dimension with each hit, you could, instead, perform a data import for author and category against the page.
As mentioned above, GA Standard supports only processing-time imports other than Ecommerce refunds and cost data, so it would be beneficial to import this data as early as possible if you’re using GA Standard. (With Analytics 360, you can take advantage of query-time import to add data to hits that have already been processed.)
The procedure for importing content data is similar to importing user data as seen in the previous CRM example, but instead of matching on a visitor ID custom dimension or user ID, we’ll match our content on the Page dimension, or a portion of it.
To perform the author and category import, take the following steps:
Analyze the CMS key value relative to the GA Page dimension.
Before we create a data set and associated schema in GA, let’s consider the CMS export, especially the key, and how the key is represented in the page URL.
Your CMS is essentially a database that injects content into the page. Each page on your website exists as a record in your CMS. When a Web visitor requests a page on your website, the CMS (1) reads the page key from the URL, (2) pulls the corresponding fields from the CMS into a page template, and (3) sends the resulting HTML for the page to the requesting browser.
As a simple example, let’s say that three URLs in the news section of your meteorology website appear in the following format:
/news/article.php?articleId=3293
/news/article.php?articleId=4588
/news/article.php?articleId=5214
The corresponding row in your CMS might be structured as in Table 17.1, with the articleId parameter in the URL serving as the unique key in the CMS.
For Key, select Page. In many cases, the key in the imported data set will correspond only to a portion of the Page value rather the full Page value as detailed in Table 17.2.
Keep the overwrite option set to No unless you’re performing an additional import and any previous author or category values that may have changed in the interim.
Since the key values previously listed in Table 17.1 correspond only with the value of the articleId query parameter in the Page dimension, we can specify articleId as the query refinement. This refinement applies only when your CMS key appears as the value in a name=value pair. The key in the data set configured in Figure 17.8 uses a query refinement on articleId.
If, instead, your CMS records were identified with a text key such as typhoon, and that text was incorporated into the Page value but not with the name=value format, you could instead use regex refinement to isolate the key within the URL pattern. In the regex refinement example in Table 17.2, /news/ identifies the static pattern of Page value, and ([^/]+) represents the dynamic portion of the Page value that corresponds to the CMS key.
You could also use a regex refinement to match a single CMS key to multiple page variations, such as /news/article.php?articleId=5214 and /news/article.php?articleId=5214&sessionId=12374, but, ideally, you should instead make every effort to consolidate your Page values by using the Exclude URL Query Parameters view setting as discussed in Chapter 9, “View Settings, View Filters, and Access Rights.”
Upload the .csv file shown in Listing 17.2 as in the lead status example above.
This data import populates author and category just as if we were recording custom dimensions with each pageview on the meteorology website, and we can now create a custom report to display performance by author and category.
Table 17.1 Sample Content Management System (CMS) Records
Article ID | Title | Meta Description | Main Content | Author | Category |
3293 | Winter Outlook 2017 | Long-range forecast for winter 2017 | It appears that winter in the northern hemisphere … | Andrew Cullen | Forecasts |
4588 | Worldwide Water Update | Comprehensive water study | As we analyze hydrological data from around the world … | Stacy Hamida | Hydrology |
5214 | Typhoon Watch | Latest tracking for Pacific cyclones | This typhoon season in the Pacific is proving to be very active … | Andrew Cullen | Cyclones |
Table 17.2 Key Refinements for Page Dimension
Page dimension in GA | /news/article.php?articleId=5214 |
Key in CMS | 5214 |
Refinement | query refinement: articleId |
Page dimension in GA | /news/typhoon |
Key in CMS | typhoon |
Refinement | regex refinement: /news/([^/]+) |
In Chapter 7, “Acquisition Reports,” we discussed the extreme importance of using the utm_medium, utm_source, and utm_campaign URL parameters to more accurately record our traffic sources and to populate the campaigns report with accurate, structured campaign data as in Listing 17.3.
There is, however, another option for populating campaign parameters into GA: you can pass a single utm_id parameter in the URL, as shown in Listing 17.4, and then import the campaign parameters—and custom dimensions as an option—against this ID as the key, as shown in Listing 17.4.
Why might you use the simplified campaign parameter format in Listing 17.4? For one thing, it can be a bit compromising to explicitly display campaign parameters for your website visitors to plainly see. In most instances, direct indications of marketing descriptions can only be a distraction from the user and brand experience that you aim to provide. In other cases, you might be dealing with advertising platforms that allow a single campaign ID only.
To import campaign data:
Define the schema as in Figure 17.9. Note that we have created a Campaign Group custom dimension, which you can use (in a custom report or segment, or as a secondary dimension) to distinguish between product campaigns and informational resource campaigns. Also, as mentioned in Chapter 7, campaign name is not actually mandatory for campaign tracking, but it’s always recommended as best practice.
Populate a .csv file as in Table 17.3.
Table 17.3 Representation of the Key, Campaign Parameters, and a Custom Dimension to Match the Schema in Figure 17.9
ga:campaignCode | ga:medium | ga:source | ga:campaign | ga:dimension8 |
198 | asia-list | 20160701-newsletter | info-resource | |
199 | qa-code | catalog | 20160705-spring-discount-code-qa | product |
200 | europe-list | 20160708-newsletter | info-resource | |
201 | social | 20160709-spring-discount-code-fb | product | |
202 | social | 20160710-spring-discount-code-li | product |
By linking your Google AdWords and GA and enabling AdWords auto-tagging as described in Chapter 14, “Google Analytics Integrations—The Power of Together,” you can view AdWords cost data within GA. While GA does not offer direct integrations with other paid advertising, you can manually import cost data from other platforms such as Bing or Facebook.
Importing cost data is fairly similar to the data imports we reviewed previously in this chapter, but a few special considerations are detailed in the following procedure.
Once you import campaign cost data, you’ll be able to access the Campaigns ˃ Cost Analysis report, as shown in Figure 17.12.
Notice, however, two types of data that the Cost Analysis report does not include:
To incorporate cost and performance data for AdWords and non-AdWords campaigns into a single report, we can easily configure a custom report as shown in Figure 17.13.
Product data import is quite straightforward and allows you to add or overwrite many Ecommerce and Enhanced Ecommerce dimensions, as well as the price metric, using product SKU as the key.
Product SKU or name is, in fact, the only product detail that you are required to provide when you’re coding your actual Enhanced Ecommerce interactions on your website; you could potentially provide SKU only at transaction time and match to imported Ecommerce dimensions.
In addition to the dedicated Ecommerce dimensions, you can import custom dimensions—such as size, color, warranty level, or any other product or service descriptor—per SKU, as the custom report in Figure 17.14 illustrates. Keep in mind that you could also populate any of these built-in or custom dimensions while the user is interacting with your Web pages or mobile app; the product data import is a different option for associating this extra data with each SKU.
In addition to the geographic hierarchy GA offers by default, you can create your own geographic divisions by importing against city, region (which usually corresponds to province or state), country, or subcontinent.
For instance, you could create a custom dimension named US Regional and import a value such as Northeast, Southeast, and so on against the built-in Region ID dimension value for each U.S. state and then report performance by these imported dimension values, as in Figure 17.15.
When Google Universal Analytics was first released and began to be adopted, we as GA users observed that most of GAs functionality remained the same. There were a few additional admin settings, and the syntax for native tracking had changed to a simpler format, but not too much else had changed in terms day-to-day reporting.
Several of the new capabilities are what makes Universal universal: cross-device tracking, custom metrics and additional custom dimensions, and, perhaps more than any other feature, the MP. The MP is arguably the most universal part of Universal Analytics in that it allows you to send Hypertext Transfer Protocol (HTTP) requests from any programmed, networked device or environment to record data into GA.
Measurement Protocol is a much more specialized usage of GA, but it’s important to know that this option exists. Scenarios for MP usage include mobile apps designed for Windows, Blackberry, or other mobile operating systems for which GA SDKs are not available. In the following sections, Hazem Mahsoub provides key insights on MP, and Matt Stannard walks us through two innovative and outcomes-focused examples of the MP in action.
Data import is based on keys and targets. You import based on a common key in GA and the uploaded .csv, and populate dimensions and metrics in GA based on the key match.
Import custom dimensions (and metrics). In addition to importing built-in dimensions (such as product category), you can import custom dimensions (such as product size or article author). The import of custom dimensions was formerly referred to as dimension widening, since the process adds dimensions to user, hit, and product data that has already been recorded.
Data import retroactivity. If you’re using Analytics 360, you have the option of query-time import for retroactivity. In GA Standard, data import is performed at processing time, that is, on a go-forward basis only. Cost-data import does apply retroactively, even in GA Standard.
Key refinements for content imports. The Page dimension usually serves as the key for content imports, but in many cases, the key value in the .csv file matches only a portion of the Page dimension value. In this case, you can apply a regex or query refinement to extract only a part of the Page value as a match against the key value in the import file.
Import campaign parameters based on campaign ID. If you prefer not to display the utm_medium, utm_source, and utm_campaign parameters directly in your URLs, or if you’re working with an advertising platform that allows only a single parameter, you could add a single, discreet utm_id parameter, which will populate the ga:campaignCode dimension that you can use as a key for importing medium, source, and campaign.
Import cost data from advertising channels. You can import cost data from other advertising channels such as Bing or Facebook and compare metrics such as cost per conversion or Ecommerce transaction for each of the channels, and for AdWords, whose cost data you can import instead using the autotagging automation without any manual importing.
While the combination of metric and source values serve as the key in the schema for cost data import, the campaign value acts as the de facto key in many cases, so you must ensure that any campaign value that you’re importing from the .csv exactly matches a campaign value that you’ve already populated in GA.
Measurement Protocol. You can use the Measurement Protocol to send data through HTTP requests from an environment where the GA tracking code or SDKs cannot run, such as a Windows or Android app or a kiosk.