Modern Data Stack (MDS): What It Is and How to Build It?

Picture1-Mar-05-2023-10-26-46-1971-PM

Data is being generated at an unprecedented rate and processing it to extract business value is getting more important every day to gain a competitive advantage. But how to extract value from data without investing vast amounts of money upfront?

In the past, extracting value from data required heavy investments in infrastructure, specialized skills, and tools. It made analyzing and processing large volumes of data possible for only a few companies, as it required investing in top-tier infrastructure and buying software from big players in the market. Once you committed to a vendor, usually data loading, transformation and analysis would be possible only by investing in more tools from the same vendor. 

The advent of cloud computing and open-source software made it possible to solve this problem. This novel approach is called the MDS). There is a flourishing market of SaaS solutions focused on making the task of storing, processing, and analyzing data much easier and affordable.

What Is the Modern Data Stack?

Built to enable companies of all sizes to be capable of dealing with data, the MDS is a combination of best practices, tools, and architecture principles that aims to simplify how an organization addresses its data needs. Leveraging cloud computing to reduce infrastructure costs while increasing flexibility and scalability, it is a combination of SaaS tools, platforms, and patterns used for data integration, allowing consolidation of multiple data sources stored in a specific location. It lowers the gap of data integration, driven by the need for scalability and cost-effectiveness of cloud data warehouses, and requires low technical configuration from users and infrastructure architects.

What makes it a modern approach is the use of innovative and specialized solutions, fully managed ELT data pipelines, a cloud-based Data Warehouse or Data Lake/Lakehouse as a destination, a Data Transformation tool, and a business intelligence or data visualization platform.

The MDS Building Blocks (Architecture)

The Modern Data Stack separates layers with different responsibilities, integrating them in a way you can extract the most value from each component: Data Ingestion, Transformation, Storage, and Visualization. Not only using cloud-based infrastructure, but also processes guided by Data Catalog, Lineage, and Governance, in addition to Privacy, Security, and Compliance. This separation also allows you to choose state-of-the-art software for each of its components.

Data Ingestion

A variety of data extraction and ingestion tools allows collection of data from internal and external sources. The tools support different file formats, such as CSVs, spreadsheets, web and mobile applications, social media, documents, marketing tools, and others. There are tools that can also manage streaming and batch data processing.

As an example, if you need event tracking data, it is easy to add a SaaS event collector into the stack. The use of specialized tools enable fast data extraction and implementation, empowering data analysts to quickly look for insights into the data.

Data Storage

Data Storage is the heart of the entire Data Stack.  It is where data from various sources are centralized and consolidated, for analytical purposes. Different approaches are possible, we depict three of the most common below. 

Picture2-Mar-05-2023-10-28-41-4677-PM

Data Transformation

Combining and summarizing data from various sources is the main task here so that metrics and data models are built to enable business insights from the data. These tools retrieve, transform, and move data back into storage. Many of them will process the data using the database processing capabilities, while others may need data to be transferred to and from the storage for further processing.  Like the other building blocks above, there are a variety of tools to choose from to suit the unique needs of each business. 

Data Analysis Tools

After loading, transforming, and storing the data in the warehouse, it is time to analyze the data and gain insights that extract value from it. In the MDS, multiple tools are available to help with this task. 

Data Visualization tools enable interactive dashboards and visual analytics to better understand the data while enabling the analysts to present their findings in an easy-to-understand format. It also makes it easier for the stakeholders to understand data and derive insights.

Tools aimed at more advanced users can also help accelerate the analysis by enabling descriptive statistics or regression analysis. Advanced analytics are also possible, many of the tools currently available support machine learning, natural language processing, predictive modeling, and more.  

Data Governance and Quality

The applicability of self-managed tools allows data and product teams to focus on data governance and monitoring, which makes them dynamically focused on businesses and data reliability. All of those with the usage of alerts and logging, as well as data quality and lineage methodologies.

While compliance requirements and rules have been strengthened, such as HIPAA, PCI DSS, CCPA (California Consumer Privacy Act), and  GDPR, it is necessary to take additional care of the security and privacy of customers’ data. There are tools that can help generate universally unique identifiers or hashed fields from data collected which supports being compliant with such regulations.  Due to those regulations, there are still tools that also adhere to compliance by themselves, eliminating the necessity of additional processing of the data.

Regarding its quality, data can be tested by automated tests, machine learning or statistical methods for historical behavior, infrastructure monitoring, or a mix of all of them. Testing data for accuracy, consistency, and reliability, like searching for duplicate records, null values, validation, data relationships, and historical behavior enable business owners to validate if data matches the business needs. 

In Fact, How to Build a Modern Data Stack?

These are some key aspects to consider when planning and executing a strategy based on a modern data stack:

  • Know your budget - There are tools that allow an almost free trial which is great for an MVP (Minimum Viable Product) but will later demand resources (financial and/or technical). For those trial plans, know the maximum number of events allowed (being careful not to exceed this threshold in your MVP).
  • Know the compliance rules related to your business - Some businesses are subject to specific regulations and must ensure data is secured while stored or transmitted on cloud servers. Check for sensitive data such as medical, financial/tax, or communications data. Some services may have to be avoided.
  • Understand the team skills and capacity - Is there expertise in SQL, programming, and cloud artifacts, for example? Is there time to train or is a quick approach needed?
  • Know the type of data to be collected for business needs - Is there a necessity to work with relational or non-relational data? Which Extraction and Integration tools better fit the requirements?
  • Data storage strategy - Decide where data from the various sources will be centralized and consolidated: will you use a Data Lake or Data Warehouse?  What are the business needs? Which type of data modeling will be necessary?
  • Maturity evaluation - Understand the current and desired data maturity level of the organization and consider it while selecting the components of the data stack. Is there a well-defined architecture in place? What kind of security measures are important? Is the data quality an issue? Is there any formal data governance in place?

Conclusion

If you need to create a data stack that is flexible, and scalable and that enables you and your team to focus on data analysis, instead of dealing with complex infrastructure, the Modern Data Stack is the way to go. Many tools are available to deal with a variety of requirements, and our team can help with selecting the ones that best fulfill your needs.

References

How Modern Data Stacks Boost Results & Data Empowerment (encora.com) 

Encora Technology Trends 2023

Acknowledgment

This article was written by Marlon Weiss Hoffmann, Marcio Fabiano dos Santos, and Matheus Bedini Passini. Thanks to Ivan Caramello, Renato Pezzotti, and João Caleffi for reviews and insights.

About Encora

Fast-growing tech companies partner with Encora to outsource product development and drive growth. Contact us to learn more about our software engineering capabilities.

Share this post

Table of Contents