Zero Copy Cloning in Snowflake

Talati Adit Anil | January 20, 2023

Zero Copy Cloning means “Copying without actually copying!”

Before we go deeper into this feature of Snowflake, let’s step back and talk about what Snowflake is.

Introduction to Snowflake

Snowflake is a cloud-based data warehouse that offers a single platform for all data needs. This includes data warehousing, data lakes, data engineering, data science, and application development. Additionally, Snowflake offers a “pay as you go” pricing model, which allows users to only pay for the storage and computing resources they use. Its architecture is defined on a Shared-disk and Shared-nothing database architecture and has three architectural layers:

  1. Cloud Services: The coordinator and collection of Services
  2. Query Processing: The brain of the system, where query execution is performed using “virtual warehouses”
  3. Database Storage: This physically stores the data in Columnar mode


Picture1-Jan-20-2023-10-12-22-2328-PM

Source: Snowflake Documentation

 

Snowflake offers a wide variety of features like Near-Zero Administration, Real-Time Data Collection, recovering objects using undrop, sharing data between accounts, Storage & Compute isolation, JSON support with SQL, Auto Encryption, Auto Query Optimization, and the very special feature, Zero Copy Cloning.

 

What Is Zero Copy Cloning?

Cloning is not a new concept, with one of the best examples being - git clone. 

It is a common concept for databases too, like Oracle, DB2, and SQL Server. We can clone data by copying, which has multiple disadvantages:

  1. Data duplication
  2. Time consuming to replicate data (If the volume is high)
  3. Data sync
  4. Additional storage needs 

How Does Snowflake Clone Data So Fast?

In the traditional approach, cloning creates a real copy of the database which consumes space. 

Snowflake instead, has a wonderful utility called Cloning (they call it Zero Copy Clone) which can clone a database/table seamlessly with a simple command. It is very similar to Pointers that are just a reference to another variable; instead of actually being the variable. When a command is given to clone a database, it does the following:

  1. Creates the new Database (as Object Name)
  2. All objects underneath the database are created

 

One could expect the data to be held as a “replica.” But it’s not. What Snowflake smartly does is create a pointer/reference to the source database/tables from the cloned object. When a user queries a table from the cloned object, the cloud service simply fetches the data from the actual source, making the data as up-to-date as possible. 

Snowflake does all this by playing around with the metadata. When you create a clone for your data, Snowflake, instead of copying data, creates new metadata information pointing to the data of interest. All these operations are performed by Snowflake’s global services layer. This means you can literally clone terabytes of data within seconds without incurring in computation or storage costs.  

The entire operation has zero cost, hence the name - Zero Copy Clone.

 

How Does Snowflake Handle Update Operations?

Once written, Snowflake micro-partitions are immutable and cannot be modified.  This means, when a user executes an UPDATE operation, instead of finding and updating the data in place, Snowflake simply creates a new version of the relevant micro-partitions with the changed data.

Example: 
As a first step, let us create a clone of production data (Table_A) to the stage environment with just a simple command as shown below:

Picture2-Jan-20-2023-10-15-15-5666-PM

With the above command, TABLE_A_CLONE is created in the stage environment with the same data available at the time of running this query in the table named TABLE_A.  TABLE_A_CLONE is the same as any other table & supports all kinds of DML and DDL operations. Snowflake’s underlying architecture stores data in smaller blocks called micro-partitions. So essentially, your clone is just a new set of metadata that is pointing to the same micro-partitions that holds the data of Table_A (as depicted below).

 

Picture3-Jan-20-2023-10-16-28-8040-PM

Now let us say due to activity in the stage environment, some of the data of table TABLE_A_CLONE gets modified. All this modified data belongs to Micro-Partition-3 (MP-3). Since the same micro-partition (MP-3) is shared by Table_A. How does Snowflake deal with this change? It does not copy all micro-partitions, instead it copies that modified micro-partition alone and creates a new micro-partition (MP-4) and assigns this one to the stage environment alone; as this change is owned by TABLE_A_CLONE alone.

Micro-Partitions in Snowflake are immutable, meaning that once created, they last in the same state until the table is dropped. Hence, the change in stage environment is captured separately and metadata points to the newly created micro-partition and updates TABLE_A_CLONE. So, additional storage costs are levied only for the modified data but not for the complete clone. This change is depicted as below.

Picture4-Jan-20-2023-10-17-01-2552-PM

How Does Access Control Work with Cloned Objects?

A cloned object does not automatically get copy privileges from the source object during cloning. A system administrator, or the owner of the cloned object, must explicitly grant any required privileges to the newly-created clone.

However, if the source object is a database or schema, for child objects contained in the source, the clone replicates all granted privileges on the corresponding child objects. To create a clone, your current role must have the following privilege(s) on the source object:

•    Tables: SELECT
•    Pipes, Streams, Tasks: OWNERSHIP
•    Other objects: USAGE

 

Advantages of Zero Copy Clone

To summarize, here are the key advantages of zero copy cloning:

  1. Almost no additional storage costs for cloning data
  2. No waiting time for cloning data from one environment to another
  3. No need for administrative efforts, as cloning is as simple as a Select
  4. Since data exists only in one place, it’s easy to maintain
  5. Instantly promotes corrected/fixed data to production

 

Conclusion

Zero-Copy Cloning can avoid creating multiple development environments in separate accounts. Until now, the typical development team used to create separate production, development and test environments, which was costly and time consuming. Creating on large copies of production tables was a tremendous task. The alternative offered by companies like Snowflake Zero Copy Cloning can not only save you time, but it will also save you a lot of money!

 

About Encora

Fast-growing tech companies partner with Encora to outsource product development and drive growth. Contact us to learn more about our software engineering capabilities.

Insight Content

Share this Post

Featured Insights