For many organizations, it is challenging to be able to provide security today, especially in the cloud, given the number of threats and attacks that are occurring daily. Safeguarding data is paramount for Snowflake. The Snowflake services platform was built with security in mind from the beginning. The company has implemented a security framework that we believe addresses a lot of their customers’ compliance challenges today.
Security is an important aspect in today’s world. Developers have to secure their data and prevent unauthorized access to it, which is why Snowflake encrypts all the data automatically, including data at rest and in transit. In addition, Snowflake provides multifactor authentication and performs federated authentication.
One of the challenges with on-premises solutions is that data can reside at many different locations, so controlling the data flow and who’s accessing it is challenging. With the cloud, you can build the right security controls to safeguard your data, but security doesn’t stop there. There are many more aspects that are related to monitoring and ensuring the system is constantly protected.
The Snowflake platform is a cloud-native solution, and it provides security so that you don’t need to worry; in other words, it is managed for you. Snowflake provides an end-to-end security solution to its customers, from when the data leaves a customer’s premises through the untrusted Internet to the point when it arrives at the Snowflake storage; all along the way, the data is protected. Moreover, Snowflake hardens all the virtual machines that data resides on. Snowflake encrypts data, does audits, monitors, sends alerts, and installs patches on a continuous basis. All of this actually simplifies and facilitates the security efforts of customers. So, the customer does not necessarily have to incur all the procedural and compliance costs associated with security.
Snowflake security reference architecture
Network and site access
Account and user authentication
Object security
Data security
Security validations
Snowflake Security Reference Architecture
Storage layer, where all the data is stored in a columnar compressed format and is always encrypted.
Compute layer, comprised of virtual warehouses, which are the compute nodes that perform all of the data processing. Multiple virtual warehouses can work on the same data at the same time.
Services layer, also known as the “brains” of Snowflake. This is where all security information/metadata is stored and also where all query processing is completed. The service layer also includes transaction management, which coordinates across all of the virtual warehouses, allowing for a consistent set of operations against the same data at the same time.
This unique architecture allows Snowflake to ensure a high standard of security for its customers. Figure 8-1 shows Snowflake’s security reference architecture. It describes the components that make up Snowflake’s secure data warehouse. We will cover the key elements of this diagram in this chapter.
Note
This chapter will cover the security features that are available to date. Snowflake is constantly working on adding new features.
Virtual Private Cloud
First is the concept of a virtual private cloud (VPC) . Snowflake is implemented as a VPC within the cloud provider’s infrastructure. If a customer requires complete isolation from other Snowflake customers because of strict security requirements such as in the case of a financial institution, the Virtual Private Snowflake (VPS) edition must be used. When implemented, VPS is a Snowflake implementation entirely on its own VPC within the cloud provider’s infrastructure.
Physical Security
Each cloud provider, including Amazon Web Services, Microsoft Azure, and Google Cloud Platform, provide their own infrastructure and physical security to guard all of their cloud data. Physical security includes 24-hour armed guards and video surveillance to ensure no unauthorized access is allowed in the data center. Neither Snowflake personnel nor Snowflake customers have access to these data centers. Data redundancy is also a standard practice implemented by the cloud provider for data recovery.
You can learn more about physical security from each cloud vendor by visiting their documentation.
Network and Site Access
All customer access to the Snowflake service via the Internet is made via the secure protocol HTTPS. Moreover, all Internet communications between users and the Snowflakes service are secured and encrypted using TLS1.2 or higher.
All communication between connection methods and Snowflake is secure, regardless of the method used to connect, whether via the web user interface or ODBC or JDBC connectors. Authentication is required to gain access to Snowflake. These connections are encrypted and communicate solely over HTTPS.
Access to Snowflake is subject to network policies. These policies provide options for managing network configurations to the Snowflake service, such as restricting access to an account based on a user IP address. Currently, Snowflake customers can implement a network policy to create an IP whitelist, which is a list of allowed IP addresses, as well as an IP blacklist, which lists those IP addresses that are forbidden access.
Figure 8-2 shows the Snowflake web UI for managing access policies.
For increased network connectivity security, private and direct communication between Snowflake and other VPCs can be set up via an AWS private link (in the case of AWS deployment). This feature, which effectively creates a private tunnel of communication between Snowflake and the VPC, is currently available only for the Business Critical Edition, formerly known as Enterprise for Sensitive Data (ESD), or VPS customers.
Account and User Authentication
For account access and user authentication, multifactor authentication (MFA) can be implemented for increased security on account access by users. MFA support is provided as an integrated Snowflake feature powered by the Duo security service and managed completely by Snowflake. The only additional task after enabling MFA is to install the Duo mobile application, which is supported on multiple smartphone platforms including iOS, Android, and Windows.
Currently, each user must enable MFA by themselves. As a security best practice, all users with the account admin role should enroll with MFA.
Single sign-on (SSO) is a user authentication method that, once enabled, allows users to authenticate through an external SAML 2.0–compliant identity provider known as an IDP.
When authenticated, users can securely initiate one or more sessions in Snowflake for the duration of their IDP session. These sessions can be initiated from within the interface provided by the IDP or directly from within Snowflake. This feature is available for customers on Enterprise Edition and up.
Object Security
Access to specific objects within Snowflake, such as warehouses, databases, schemas, tables, etc., is controlled by a hybrid model of discretionary access control (DAC) and role-based access control (RBACK) .
Note
Discretionary access control (DAC) is when each object has an owner, who can in turn grant access to that object. Role-based access control (RBAC) is when access privileges are assigned to roles, which are in turn assigned to users.
Data Security
Encryption is enabled by default in Snowflake. All customer data is encrypted at rest. This includes not only the database data but also the virtual warehouse cache and query results cache, which are both used for performance optimization within Snowflake. All communication is encrypted in transit over public networks and even within the Snowflake virtual private cloud for customers who use the Business Critical Edition.
Note
Advanced Encryption Standard (AES) is a symmetric encryption algorithm. The algorithm was developed by two Belgian cryptographers, Joan Daemen and Vincent Rijmen. AES was designed to be efficient in both hardware and software and supports a block length of 128 bits and key lengths of 128, 192, and 256 bits.
All files that are stored in Snowflake internal stage objects are automatically encrypted using either AES128 or AES256 strong encryption. Specific additions of Snowflake also provide periodic rekeying of encrypted data and support for customer-managed encryption keys.
Business Critical Edition of Snowflake allows us to use the Tri-Secret Secure feature. This encryption is achieved using key wrapping, which means using one key to lock up another. For example, if a user attempts to access encrypted data within Snowflake, the data must first be decrypted. To decrypt it, the data key is necessary, but the data key itself is also encrypted or wrapped and requires another key, which is the table key. Again, the table key is locked and requires yet another key, the account key, to unlock it. The account key is also locked and can be accessed using the root key that is stored in the hardware security model, or Amazon CloudHSM within the cloud provider in the case of an AWS implementation.
Amazon CloudHSM is a piece of hardware that is specialized for encryption. The account key would need to be passed into CloudHSM and unlocked by the root key. Then the hierarchy of table and data keys can be subsequently unlocked, and the unencrypted data can be returned to the user.
Encryption keys are rotated automatically for accounts running on certain editions of Snowflake. The entire process of rotating encryption keys is completed behind the scenes and is transparent to the end user. With key rotation, a new version of a key is created, and the previous version of this key is retired. The new version of the key is used to encrypt data, while the previous version of the key is retired and used only to decrypt data. In other words, with key rotation, new data gets fresh keys.
Snowflake takes security seriously, which is why the end-to-end encryption of data is a default feature of the service. Whether data is in flight between the customer and internal stage or at rest and stored in a Snowflake database table, the data is always in an encrypted state.
To protect data against loss, Snowflake leverages data redundancy implemented by the cloud infrastructure provider. Each cloud provider region is geographically dispersed to several data centers across several miles within the region. The cloud infrastructure within each region provides automatic synchronous replication of data to three different zones for redundancy, should one’s own have a failure. The data is available from one of the other two zones in the region.
Security Validation
Snowflake Security Validations
Type | Description |
---|---|
SOC 2 Type II | Designed for service providers storing customer data in the cloud. It requires companies to establish and follow strict information security policies and procedures encompassing the security, availability, processing, integrity, and confidentiality of customer data. |
HIPPA | Stands for Health Insurance Portability and Accountability Act. Passed in 1996, HIPAA is a federal law that sets a national standard to protect medical records and other personal health information. |
PCI DSS | Stands for Payment Card Industry Data Security Standard. This standard sets the requirements for organizations and sellers to safely and securely accept, store, process, and transmit card holder data during credit card transactions to prevent fraud and data breaches. |
CAIQ | Stands for the Consensus Assessments Initiative Questionnaire (CAIQ). This is a survey provided by the Cloud Security Alliance (CSA) for cloud consumers and auditors to assess the security capabilities of a cloud service provider. |
Snowflake Audit and Logging
If you click the SQL text, a dialog will pop up with a success or failure message, as well as with actions to take to resolve any errors.
Another field in the activity log is Query ID. This ID can be used by Snowflake Support to look up a specific query instance for troubleshooting. Again, Snowflake personnel do not have access to customer data but can access metadata such as the query statement and query plan.
Clicking the Query ID field in the activity log will jump to the Query Profiler, allowing the user to view how the query optimizer worked and if there are any bottlenecks to resolve.
Query Profiler
When we work with data warehouse and business intelligence, often we have to deal with performance issues. To understand why our query or our report is slow, we should understand the mechanics of querying. Query Profiler helps us to spot typical mistakes in SQL query expressions to identify potential performance bottlenecks and improvement opportunities.
Key Elements of Query Profiler
Element | Description |
---|---|
Steps | If the query was processed in multiple steps, you can toggle between each step. |
Operator tree | The middle pane displays a graphical representation of all the operator nodes for the selected step, including the relationships between each operator node. |
Node list | The middle pane includes a collapsible list of operator nodes by execution time. |
Overview | The right pane displays an overview of the query profile. The display changes to operator details when an operator node is selected. |
You can find more information about Query Profiler in the Snowflake documentation at https://docs.snowflake.net/manuals/user-guide/ui-query-profile.html.
Login History Audit Logs
Snowflake provides table functions for extracting audit log history from the metadata. The login history family of table functions can be used to look up user login history with various filters such as time range or specific user.
Login History Audit Functions
Function | Description |
---|---|
LOGIN_HISTORY | Returns queries within a specified time range |
LOGIN_HISTORY_BY_SESSION | Returns queries within a specified session and time range |
Query History Audit Logs
Query History Audit Log Functions
Function | Description |
---|---|
QUERY_HISTORY | Returns queries within a specified time range |
QUERY_HISTORY_BY_SESSION | Returns queries within a specified session and time range |
QUERY_HISTORY_BY_USER | Returns queries submitted by a specified user within a specified time range |
QUERY_HISTORY_BY_DATAWAREHOUSE | Returns queries executed by a specified warehouse within a specified time range |
Penetration Testing
Penetration tests are an integral part of Snowflake’s ongoing testing of security controls and procedures. Seven to ten tests are performed each year to ensure no new holes or flaws arise in security. If a vulnerability is found, the security team will log and track it to closure. The results of these penetration tests are available to customers under NDA with Snowflake.
You can find more information about penetration testing in the article “Snowflake: Serious about security” by Susan Walsh at https://www.snowflake.com/blog/snowflake-seriously-serious-security/.
Summary
Network/site access
Account/user authentication
Object security
Data security
Security validation
Audit and logging
For each category, Snowflake provides extensive online documentation.
In the next chapter, you will learn about Snowflake’s unique capabilities of working with semistructured data formats like JSON, XML, and AVRO.