
Snowflake OverviewA data warehouse is a critical part of any business organization. Lot of cloud-based data warehouses are available in the market today, out of which let us focus on Snowflake.Snowflake is an analytical data warehouse that is provided as Software-as-a-Service (SaaS). Built on new SQL database engine, it provides a unique architecture designed for the cloud. It stands out among the other enterprise data warehouses by providing lot of features. It is a SaaS offering, and hence it makes it a lot more flexible than traditional data warehouse offerings.The distinctive features of Snowflake as a cloud service are as below:
Here’s a detail on the architecture of Snowflake.The architecture is three layered:
Snowflake has a mechanism to reorganize the data into its internal optimized, compressed and columnar format and stores this optimized data in cloud storage. This stage handles all the aspects of data storage like organization, file size, structure, compression, metadata, statistics. The data objects are accessible only through SQL query operations run using Snowflake.
Query processing in Snowflake is done using virtual warehouses. A virtual warehouse is an MPP compute cluster composed of multiple compute nodes allocated by Snowflake from a cloud provider. Each virtual warehouse acts independently and does not share compute resources with other virtual warehouses. Hence each warehouse has no impact on the performance of the other.
This layer is a collection of services that coordinate activities across Snowflake. This ties together all the different components of Snowflake in order to process users request, from login to query dispatch. The services provided in this layer include authentication, infrastructure management, metadata management, query parsing and optimization, access control etc.Snowflake Data AcceleratorsQuick analysis of data can only be done by loading the data first without any dependencies and delays. Once the data is loaded, analysis can be done quickly using Machine Learning and predictive analysis to produce useful insights. This process also requires a dedicated resource for ELT development and operations. This workflow increases cost, introduces redundancy in efforts and creates inconsistency.The data ingestion cycle usually comes with a few challenges like high data ingestion cost, longer wait time before analytics is performed, varying standard for data ingestion, quality assurance and business analysis of data not being sustained, impact of change bearing heavy cost and slow execution.As a solution to overcome the above challenges, one can create a sustainable data ingestion framework that brings reusability across the enterprise and provides a consistent standard, eliminating the need to maintain several projects or programs.The Data Load Accelerator meets the above-mentioned solution. It provides an intelligent framework that can reduce and eliminate the ELT coding efforts, consolidate management, shorten the development cycles, and support complicated data load requirements.The key features of the Data Load Accelerator include:
The Data Load Accelerator works with the Cloud Storage layer for ingesting data into Snowflake. The accelerator provides two executable components that can run with a dependency or even be de-coupled.
For seamless maintenance and support, automated reports are used that collect Talend and Snowflake logs to provide information on data load statistics, errors and audit findings. Accelerator is designed to auto-detect past failure instances and customizing the workflow in real time.D-Fast: RandomTrees snowflake acceleratorRandomTrees has come up with an accelerator named D-Fast that can help you with fast-track data migration during Snowflake implementation with an approach focused on data quality, cost-effectiveness and business value.The key features include:
Article written bySindhu Shree