You are now familiar with the Snowflake data warehouse (DW) and its advantages over other DW solutions. However, a typical organization won’t be using Snowflake alone. Snowflake is part of an analytics solution that consists of multiple components, including business intelligence and data integration tools.
Modern analytics solution architecture
Snowflake partner ecosystem
Integration with Matillion ETL and Tableau
This chapter will help you to learn how to build an end-to-end solution using leading cloud tools for business intelligence and data integration. You will launch Matillion ETL and load data into the Snowflake DW. In addition, you will connect to Tableau Desktop and build dashboards.
Modern Analytics Solution Architecture
Nowadays, every organization wants to be data-driven to generate more value for customers and stakeholders. The organization’s management understands the value of data and treats it as an asset. They are ready to invest in modern cloud solutions like Snowflake that are scalable and secure. However, Snowflake is just one part of the analytical ecosystem. It is the core data storage for all organization data, and it provides robust access to the data.
Figure 11-1 gives you an idea of how a typical analytics solution can look. I’ve already added Matillion ETL and Tableau to the diagram because we are going to use them in this chapter. However, you have a choice to use other products as well.
Key Elements of Architecture Diagram
Element | Description |
---|---|
Source layer | The source layer includes all the data sources available at your organization. This could include transactional databases, files, NoSQL databases, business applications, external APIs, sensors, and IoT. |
Storage layer | The storage layer is the core of solution. You may hear about data platforms, data lakes, and data warehouses. This is the place for all of them. You are ingesting data into the storage layer from the source layers, and you store this data for further analysis, data discovery, or the decision-making process. |
Access layer | The access layer is nontechnical. The main goal is to provide access for business users and allow them to interact with data through BI and SQL. |
Stream | Streaming is a method of data ingesting using real-time data injection. For example, you can collect data from sensors, and you have a strict SLA to analyze the data and make decisions. |
Batch | Batch Processing is a method of data ingesting. For example, for DWs, we load data once per day. Sometimes, we should load data more frequently. |
Snowflake | Snowflake is cloud data warehouse that can serve as a data lake. It can collect data from both batching and streaming pipelines. |
Matillion ETL | Matillion ETL is a cloud-native tool that is responsible for the extract, load, and transform (ELT) process. It was built for the cloud and provides a visual interface for building data pipelines. The ELT tool is responsible for all data movement and data transformation. |
SqlDBM | SqlDBM is a cloud data modeling tool. It was the first cloud tool that was built for Snowflake. Without a proper data model, you can’t deliver a quality DW. Moreover, it helps to communicate with business stakeholders and collaborate with a team. |
Tableau | Tableau is a visual analytics platform that connects to Snowflake and provides access for the business users and helps them slice/dice data and deliver insights. In other words, it is business intelligence tool. |
Data science tools | Data science tools provide advanced analytics capabilities. It could be an open source product, programming language (R/Python), or enterprise solution like Spark Databricks. |
In this chapter, we will show how to build simple solutions using Matillion ETL, Snowflake, and Tableau. We won’t spend much time on setting up a real source system and will use sample data sets that we will load into Snowflake with Matillion and then visualize with Tableau. Moreover, we won’t build a streaming solution or talk about lambda architecture. Based on our experience in 80 percent of use cases, using a data warehouse, business intelligence, and ELT is sufficient for a typical organization.
Snowflake Partner Ecosystem
For our solution, we should choose a data integration tool and BI tool. Based on our rich experience with DW, BI, and data integration, our favorite tools for working with Snowflake are Matillion ETL and Tableau. They are leaders in their area and allow us to build a modern analytics solution and meet business requirements and SLAs.
Building Analytics Solutions
Let’s get started.
Getting Started with Matillion ETL
Matillion ETL is cloud data integration tool. It is available for Snowflake, Redshift, and BigQuery. It increases development speed, secures data, provides rich data transformation functionality, and offers many prebuilt data connectors for Salesforce, Mailchimp, Facebook, and others. One of the biggest advantages of the tool is that it looks and feels like a traditional ETL tool with a friendly user interface where developers can drag and drop components to build their data pipeline.
List of Matillion Objects
Object | Object |
---|---|
Database | PC_MATILLION_DB |
Warehouse | PC_MATILLION_WH (X-Small) |
Role | PC_MATILLION_ROLE |
Username | Snowflake-snowflake |
After activation, the tool will immediately transfer you to the Matillion ETL web interface. This is connected to your Snowflake cluster, and you may start to work immediately. This increases your time to market.
Let’s load some initial data into Snowflake using Matillion.
Note
Our Snowflake cluster is hosted on AWS. When we launched a Matillion ETL instance from the Partner Connect page, we created the EC2 instance with Matillion ETL. It was created in a different AWS account. We can launch Matillion ETL in our AWS account by finding it in the AWS Marketplace. In this case, we will get full control over the Matillion ETL instance, connect via SSH, use an application load balancer, adjust security groups, and so on.
Running Our First Job with Matillion ETL
- 1.
Log into Matillion ETL. You can use the URL, password, and username that you’ve received in the Matillion activation e-mail.
- 2.
Navigate to Partner Connect Group ➤ Snowflake Project. You will find two demo jobs, called dim_airport_setup and dim_airports.
- 3.
Open the dim_airport_setup job by clicking it twice. In Figure 11-4, we are showing key elements of the Matillion web interface.
Key Elements of Matillion ETL Web Interface
Element in Figure 11-4 | Description |
---|---|
1 | The job list pane includes all the jobs that you are building for this project. Moreover, you can organize jobs with folders. There are two types of jobs. A transformation job (green) is responsible for data transformation inside Snowflake. An orchestration job (blue) is responsible for extracting and loading data from/to external sources. |
2 | The Components pane includes all the components available for this job type (blue or green). You can easily drag and drop components and build pipelines. |
3 | The “Shared job” pane lists the shared jobs. A shared job is a kind of aggregated job. For example, you can build a new job and reuse it as a single component. |
4 | The Environments pane lists the environments. For example, by default we have one environment that is connected to our Snowflake cluster. If we want to load data into a different Snowflake cluster or from another AWS account, we should create a new environment and specify the credentials. |
5 | Canvas is our development environment where we can drag and drop components from the Components pane or “Shared job” pane and organize them into the data pipeline. Moreover, we can add notes with tips and documentation. |
6 | The Project menu is the main menu for Matillion ETL, where you can manage existing projects, switch to others, and manage variables, schedules, API profiles, and many others. |
7 | The Admin menu is available for the Matillion administrator. From this menu you can manage Matillion users, manage backups, and download logs. |
8 | The Help menu allows you to get support information, manage active sessions, and manage license keys. |
9 | Component options connect you to Snowflake and provide access to business users and help them slice/dice data and deliver insights. In other words, this is a business intelligence tool. |
10 | The Status menu provides information about currently running tasks, shows the command log, and displays notices about available updates. |
You have learned about the key elements of the Matillion web interface, so you can now run a job. Click the right button on the canvas and choose “Run job (snowflake).” Matillion will run the current job using the environment name snowflake. This job consists of multiple steps.
- a.
Create tables using the Create Table component.
- b.
Load data from S3 into the staging tables using the S3 Load component.
- c.
Execute the transformation job dim_airport that will transform raw semistructured data into a tabular format and load it into a dimension table.
Note During this exercise, we loaded the Matillion sample data set that is stored in an Amazon S3 bucket of Matillion. This bucket is public and is available to everyone. If you have Snowflake on Azure, then you will load data from Blob Storage.
- 4.
After the job is finished, we can go back to the Snowflake web UI and check the new objects that were created by Matillion. Figure 11-5 shows the list of Snowflake tables that were created by the Matillion orchestration job.
We launched Matillion ETL and loaded sample data into the Snowflake DW. In a real-world scenario, we would create many more jobs and collect data from external sources. For example, for marketing analytics use cases, we need to load data from social media platforms such as Facebook, Twitter, YouTube, and so on. Matillion ETL provides prebuilt connectors that will save time for data engineers or ETL developers.
Moreover, for a quality solution, we should design a data model for querying our data. We might choose a technique like using Data Vault, dimensional modeling, and so on. The best choice for the Snowflake data model is SqlDBM.
The final step is to connect to a BI tool. We need a BI tool for simplifying access for nontechnical users. With Tableau, business users can do data discovery using drag-and-drop methods and powerful analytics and visualization capabilities. For our sample solution, we will install Tableau Desktop and connect to the matillion_dim_airport table in order to visualize data.
Getting Started with Tableau
Tableau is a leading visual analytics platform. There are many tools available on the market, but Tableau stands out among them. We have worked with many different tools from leading vendors and found that Tableau is the most powerful tool for business intelligence and self-service. Moreover, it has a large and friendly community. If you have never worked with Tableau, now is a good time to try it. Connecting Tableau to Snowflake allows us to use best-of-breed technologies working together. Tableau is available in Server and Desktop versions. Moreover, it has a mobile application. Let’s get Tableau and connect to the Snowflake cluster.
Building Our First Visualization with Tableau and Snowflake
During this exercise, we will install Tableau Desktop and connect it to the Snowflake DW. Then we will visualize the matillion_airport_dim data.
- 1.
Let’s download and install Tableau Desktop. Go to https://www.tableau.com/products/desktop/download and download a recent version of Tableau Desktop. It is available for macOS and Windows. Then install it.
- 2.Open Tableau Desktop and connect to Snowflake, as shown in Figure 11-6.
Note To connect to the Snowflake DW, you need to download the ODBC driver from the Snowflake web UI. Select Help ➤ Download ➤ ODBC driver. Download it and install.
- 3.Then, you should enter your credentials in order to connect to Snowflake from Tableau. You can use the Matillion credentials that were created during the Matillion ETL initializing, including the user role, or you can use your master credentials. You should use your admin Snowflake credentials. Figure 11-7 shows an example of the connection options.
- 4.Click Sign In and then enter the following:
- a.
Warehouse: PC_MATILLION_WH
- b.
Database: PC_MATILLION_DB
- c.
Schema: Public
Then drag and drop the matillion_dim_airports table to the connection canvas.
- a.
- 5.
Click Sheet 1, and you will jump into the development area. You just created your first Tableau live data source.
Note The Tableau data source supports live and extract options. Extract will query all data from the data source and cache it into an internal columnar data store called Hyper. The live connection will query data from the data source on demand. This is the right strategy for a big volume of data. With a live connection, Snowflake will do the heavy lifting, and Tableau will render the result. This is the secret to doing big data analytics.
- 6.Let’s create a quick visualization using the available data. Say we want to know the number of airports across states and order them in descending order. In Figure 11-8 you can see the Tableau Desktop interface and a simple report.
- 7.It is interesting to look at Snowflake to see what was happening when we built our report. From the Snowflake web UI on the History tab, we can see the SQL query that was generated by Tableau, shown here:SELECT "matillion_dim_airports"."iata" AS "iata","matillion_dim_airports"."state" AS "state"FROM "PUBLIC"."matillion_dim_airports" "matillion_dim_airports"GROUP BY 1, 2
Moreover, we are able to look at the execution plan. This is helpful when we are working with large data sets and multiple tables.
We have connected the Snowflake DW with Tableau Desktop. The next logical step is to publish the report to the Tableau server and share it with stakeholders.
Note
With Tableau, you can leverage the unique features of Snowflake such as querying and visualizing semistructured data, working with the Time Travel feature, sharing data, implementing role-based security, and using custom aggregation. Moreover, we can integrate Tableau and Matillion. You can find more good information about this at https://rockyourdata.cloud/best-practices-matillion-etl-and-tableau/.
Summary
In this chapter, we covered the Snowflake partner ecosystem, and you learned about a modern analytics architecture and its key elements. Moreover, we connected to the best cloud ELT tool for Snowflake, which is Matillion ETL, and ran our first job. Then, we built a report with the best visual analytics tool, called Tableau. At the end of this chapter, we created analytics solution that can be scaled and is ready for use in production. Using this example, you can build your analytics solution and get immediate value.
In the next chapter, we will talk about some data use cases for Snowflake. You will learn how Snowflake can handle a large volume of data.