What is Snowpipe?
Snowpipe is an ingestion pipeline, an out-of-the-box component, ready to use when you purchase Snowflake Datawarehouse. Snowpipe lends itself well to real-time data requirements, as it loads data based on triggers and can manage vast and continuous loading.
It is a “COPY INTO” command continuously looking for fresh files to process them in micro-batches as soon as they arrive at the staging area. Snowpipe eliminates the requirement to build near-real-time pipelines to process data from external locations.
A few key points to note about Snowflake Snowpipe.
- Snowpipe combines a filename and a file checksum to ensure only “new” data is processed.
- Snowpipe uses serverless architecture and Snowflake-supplied compute resources, and you are billed accordingly.
- Snowpipe loads data in response to new file notification events. These notification events are usually configured in cloud platforms.
- We can also trigger a Snowpipe manually from Snowflake or through custom programs calling the REST APIs.
Implementation of Snowpipe in Snowflake
Snowpipe is an automated service that continuously listens for new data as it arrives in Amazon Web Services (S3) cloud storage; it loads that data into Snowflake. At a high level, the first option is to point a Snowpipe to a bucket in AWS S3. Then, you define event notifications on your S3 bucket and send these event notifications to Snowflake. As soon as new files land in the S3 bucket, those files are automatically picked up by Snowpipe and loaded into your target table.
Initially, you place the file in the S3 bucket, and as soon as the file is placed, an S3 notification is set up. This S3 notification is sent to an SQS Queue provided by Snowpipe. After receiving the notification, the load from the file to the target database begins.
Fig: 1 Data Ingestion
Setting Up Snowpipe
- To create an External stage that points to the S3 bucket in Snowflake :
- To define a target table for the Snowpipe to load data:
- To create a pipe with the auto_ingest option equaling true, which indicates that we want to use the automatically configured SQS queue :
- To look at the existing pipes:
- Cloud Storage Event Notifications (AWS S3, GCP CS, Azure Blob)
- Snowpipe’s REST API
- SNOWPIPE_STREAMING_CLIENT_HISTORY View (in Account Usage).
- SNOWPIPE_STREAMING_FILE_MIGRATION_HISTORY View (in Account Usage).
- METERING_HISTORY View (in Account Usage).
Advantages of Snowpipe
- Real-time insights: It constantly offers fresh business data across all departments while avoiding workload issues.
- Reduces costs: It is cost-effective and charges customers per second based on computing time.
- Usability: It is simple to use. All you have to do is connect it to the S3 bucket, and the data will instantly load.
- Adaptable: It is highly adaptable and enables simple customization to load data using a programmatic REST API, Python, and Python SDKs.
- Zero Management: It scales up or down instantly. There is nothing that needs to be managed.
This piece was written by Talati Adit Anil from Encora.
Fast-growing tech companies partner with Encora to outsource product development and drive growth. Contact us to learn more about our software engineering capabilities.