Breaking the Monolith: A Tale of ELT Job Optimization in Snowflake

In our previous Snowflake performance-related blog “Mastering Snowflake Performance Tuning and Cost Optimization: Encora's Expertise”, we delved into the expertise of Encora's Snowflake Practitioners Group, showcasing how they harnessed the power of Snowflake's virtual warehouses to dramatically reduce ELT run times and credit consumption. Snowflake, with its unique architecture that separates storage and compute, is the driving force behind its ability to meet the high data volume processing demands of modern organizations. Now, let's explore another intriguing journey of Snowflake optimization. This time, it's a monolithic ELT job that once consumed a 2XL warehouse for a daunting 40 minutes. The transformation of this colossal task not only reduced processing time but also significantly slashed credit consumption. Buckle up as we uncover the secrets of breaking the monolith and optimizing ELT in Snowflake.

The Snowflake Advantage

Snowflake, with its innovative architecture that separates storage and compute, has become the go-to solution for organizations dealing with high data volume processing. This unique design allows organizations to scale their compute resources independently of their storage needs, providing unmatched flexibility and efficiency in managing large datasets.

The Snowflake Practitioners Group at Encora

Encora's Snowflake Practitioners Group comprises a dedicated team of experts who have mastered the intricacies of Snowflake's virtual warehouses. They understand that effective utilization of Snowflake's powerful features is the key to unlocking cost control, enhancing query performance, and optimizing resource allocation. This group brings together years of hands-on experience, industry insights, and a commitment to staying at the forefront of Snowflake's capabilities.

The Foundation: Snowflake Warehouse Considerations

To grasp the success of the Snowflake Practitioners Group, we first need to explore the foundational principles of Snowflake warehouse considerations. The group's expertise lies in their deep understanding of these key concepts:

  1. Experimentation: Snowflake's flexibility allows experimentation with different warehouse sizes and types. This approach empowers organizations to tailor their warehouses to match specific query needs and workload requirements.
  2. Credit Charging: Understanding Snowflake's credit billing structure is essential for cost management. The team is well-versed in how credits are charged, enabling effective credit consumption control.
  3. Query Composition: The Practitioners Group recognizes that the complexity and size of queries play a significant role in determining the required warehouse resources. By optimizing query composition and executing similar queries on the same warehouse, they achieve better performance.
  4. Warehouse Caching: Balancing the need to save credits by suspending warehouses with maintaining the cache for improved performance is a skill the group has mastered.
  5. Scaling Up vs. Scaling Out: Snowflake offers two approaches to scaling warehouses: scaling up by resizing a warehouse and scaling out by adding clusters to a multi-cluster warehouse. The group's expertise helps organizations choose the right approach for their specific workloads.

For further reading please visit Snowflake’s Warehouse Considerations page here: https://docs.snowflake.com/en/user-guide/warehouses-considerations

The Monolithic ELT Job

The ELT job in question was a formidable one, designed to generate a flattened summary table. It posed unique challenges:

  • Single SQL with Interdependent CTEs: The ELT job consisted of a single SQL statement with several interdependent Common Table Expressions (CTEs), making it complex and resource-intensive.
  • Joining Vast Amounts of Data: It joined several facts and dimensions, processing several hundred gigabytes of data.
  • Aggregations and Pivots: Multiple levels of aggregations and pivots added complexity to the job.
  • Point-in-Time Analysis Joins: Point-in-time analysis joins led to an explosive fanning-out of records, followed by sorting and ranking operations.

Tuning the Monolith

To tackle this monolithic ELT job, the experts at Encora's Snowflake Practitioners Group devised a series of innovative strategies:

  1. Splitting the SQL: The monolithic SQL was split into several smaller jobs, each addressing a portion of the task.
  2. Transient Tables: Smaller jobs used transient tables for intermediate data storage, reducing the need for extensive data shuffling.
  3. Execution Dependencies: The ELT execution framework maintained execution dependencies, ensuring the smaller jobs ran in the correct order.
  4. Optimal Warehouse and Cluster Selection: Appropriate warehouse sizes and cluster numbers were chosen for each job. This allowed multiple jobs to run in parallel, boosting efficiency.
  5. Drastic Reduction in Processing Time: The overall processing time of the monolithic ELT job was slashed from a daunting 40 minutes to a swift ~10 minutes.
  6. Reduced Resource Requirements: Smaller jobs had significantly reduced resource requirements, ensuring that clusters were available for other ELT processes to run concurrently.
  7. Credit Consumption: The transformation wasn't just about time; credit consumption for this process alone decreased by approximately 60%.

 

Encora's Observability and Monitoring Offering

In today's data-driven world, organizations rely heavily on ELT (Extract, Load, Transform) workloads for processing and analyzing their data. However, managing these workloads efficiently and optimizing their performance can be a daunting task. That's where Encora's Observability and Monitoring offering comes into play.

Encora's Observability and Monitoring service is designed to provide organizations with the tools and insights they need to continuously monitor their ELT workloads and overall data health. By leveraging the power of Gen-AI, this service not only tracks the performance of ELT pipelines but also identifies areas for potential performance tuning, which can lead to significant cost savings.

Key Features of Encora's Observability and Monitoring Service

  1. Continuous Monitoring: Our service keeps a watchful eye on your ELT pipelines, ensuring that they are running smoothly and efficiently. Any deviations from the norm are quickly detected and addressed.

 

  1. Performance Tuning Opportunities: Gen-AI, our artificial intelligence framework, analyzes your ELT workloads to pinpoint areas where performance can be improved. Whether it's optimizing query execution, fine-tuning resource allocation, or streamlining data transformations, we provide actionable recommendations.

 

  1. Cost Savings Possibilities: By identifying performance bottlenecks and resource inefficiencies, our service can help you reduce credit consumption and overall operational costs.

 

  1. Ongoing Monitoring: ELT pipelines evolve over time, and so do their performance requirements. Our service adapts to these changes and ensures that your data processing remains at peak efficiency.

 

With Encora's Observability and Monitoring service, you can gain unprecedented visibility into your ELT workloads, make data-driven decisions, and ensure that your data pipelines are not only performing optimally but also cost-effective.

 

Closing Remarks

The transformation of a monolithic ELT job, from a 2XL warehouse hogging 40-minute marathon to a lean ~10-minute sprint, illustrates the remarkable potential of Snowflake's optimization capabilities. Encora's Snowflake Practitioners Group, with their expertise and innovative strategies, exemplify how organizations can harness the full power of Snowflake.

This journey is a testament to the fact that even the most complex ELT processes can be streamlined for optimal performance and resource utilization. By breaking down monolithic tasks, leveraging transient tables, and parallelizing smaller jobs, organizations can not only save time but also significantly reduce credit consumption.

The path to optimization is not a one-time endeavor but a continuous voyage. As more data jobs are added and volumes grow, the need for ongoing monitoring and tuning becomes essential. Encora's Snowflake Practitioners Group, in collaboration with our Observability and Monitoring service powered by Gen-AI, is your steadfast companion on this journey. Together, we can ensure your data processing remains at peak efficiency and cost-effectiveness.

So, embrace the journey, explore the possibilities, and unlock the true potential of Snowflake with Encora, one query at a time.

Acknowledgement

This piece was written by Kedarnath Waval, Solution Architect in Encora's Snowflake Practitioners Group.

New call-to-action

About Encora

Fast-growing tech companies partner with Encora to outsource product development and drive growth. Contact us to learn more about our software engineering capabilities.

 

 

Share this post

Table of Contents