This post was initially published on the BizData blog.
A Snowflake Data Warehouse is a powerful and flexible web-based platform for handling your enterprise data. It offers a number of distinct advantages aimed at simplifying your business data, giving your organisation access to scalable data storage and processing technology specifically engineered for the cloud. It’s exciting to have capabilities such as virtual warehousing, concurrent processing and semi-structured data support at your disposal, especially when moving from conventional on-premises databases. However, getting to the point where you can take advantage of all these features is not always straightforward.
This post will cover the process of getting to an integrated and operational Snowflake data warehouse, showing you ways to overcome stumbling blocks you might have thought were unavoidable.
THE CHALLENGES OF DATA INGESTITON
Depending on what your data ecosystem looks like, pulling everything into your new Snowflake database can involve numerous steps, stages and management of multiple moving parts. This is especially the case if you are looking to ELT (Extract, Load and Transform) your CRM data without the need to code connectors and use other processing tools.
In order to manually ingest data into Snowflake, you will typically have to pull it from your CRM, identify the schema, transform it appropriately and use SQL to load it into Snowflake in the form that you need. A part of this, Snowflake ELT also typically involves staging the data in Blob storage, Microsoft Azure or Amazon S3, using it as a staging mid-point in the migration into the Snowflake database. While none of this is necessarily difficult for anyone at all familiar with databases, it can be tedious and several steps too many for anyone who is seeking to perform a large number of data migration tasks at scale and at great speed.
With the benefits of Snowflake at your disposal, you’d be justified in wanting to fast forward the setup and reaping the operational boons as fast as possible.
ENTER DATA GOVERNOR ONLINE
Data Governor Online is a tool which can help you bridge the gap between your data sources and the Snowflake cloud, getting you up and running with minimum fuss. It has been designed to make data extraction, staging, manipulating and loading into a Snowflake Data Warehouse as painless as possible. It allows for you enterprise data to be migrated into Snowflake without the need for micromanaging intermediary data staging and manually coding the steps in the ELT process.
Data Governor Online is a nexus for a massive array of data sources with Snowflake being one of over 80 native connection types available. This means you can use it to quickly and easily manage data ingestion of a number of different sources all from a single web-based application, allowing you to treat all of your data types in a consistent and governed manner.
THE DATA GOVERNOR MIGRATION PROCESS
With Data Governor Online the process of performing ELT and migrating your enterprise data into the Snowflake database can be done in a few intuitive steps through the GUI. We will cover all the steps below to show precisely how it works and the ease with which you can get Snowflake working for you.
The very first step is to configure Snowflake in the Connections menu. Simply by entering your Snowflake account details you will establish a secure and reliable connection between Data Governor Online and your data warehouse, allowing for smooth data transfer.
When the connection is made, you can add new Project from the Manage Projects button on the main page and call it “Snowflake”. This will contain every data task involving Snowflake, whether it be migrating, massaging or working with it in any other way.
Since, at this point, we are concerned with data migration, create a Job within this project titled “Snowflake Migration”.
Within this Job you can quickly queue up a sequence of Tasks which will cover the actual data migration. On this screen you can see the versatility of the Data Governor Online model and the ease with which you can create a comprehensive data transformation pipeline.
A Task can engage one of the many pre-built connectors to pull in your existing enterprise data wherever it might be stored. Additionally, a Task can be setup to run scripts, services or jobs in a number of supported formats and services. This includes PowerShell, Python, Spark SQL and many others.
The execution order of these tasks can be rearranged simply by dragging and dropping them in the interface. Additionally, conditional execution can be enabled by setting up a Depends On qualifier while editing a Task and individual Tasks can be instantly toggled on or off by clicking the Enabled checkbox.
You can immediately see that just with these basic steps you can chain up branching tasks that will perform powerful data extraction and transformation, completely bypassing the time-consuming preparation that would normally be involved.
For the purposes of this example, we will be pulling in data from Microsoft Dynamics. To proceed, it is simply a case of selecting “Data Migration” as a Task Type and defining the Target and Source.
The Staging Connection covers the intermediary process of hosting your data onto Blob storage, as required by Snowflake. Typically, this involves a number of manual steps, especially when you are trying to preserve the database schema of the Source. Data Governor Online automates this step, freeing you to focus on the critical aspects of your data rather than being constrained by the requirements of the architecture.
Depending on your Source, Data Governor will infer the Schema and automatically copy it over as part of the migration process. If required, you can manually define the schema or create an entirely new custom one by enabling the Custom Output Schema toggle.
At this point, in the Design Migration screen you’ll have an opportunity to be selective with the Tables as well as the Columns you choose to migrate. You can choose as many or as few as is relevant using either the list, which is auto-populated based on your Source, or the Query screen which allows you to perform these tasks using SQL Queries.
After you have made the necessary selections you can run the Job which will perform the Tasks, heeding the order and the dependencies with which they were defined. At this point you will see the Log screen, giving you a step-by-step annotation of what each task performed as well as any issues that might have been encountered. This makes the job of troubleshooting your data migration much easier and you can go back to look at any existing logs associated with the selected Job by clicking the View Execution History button.
Once your data migration task has run completed, you can open your Snowflake Worksheets application to view the newly created schema and the data that has been successfully pulled into it.
At this point the data migration pipeline from Dynamics CRM into Snowflake has been set up and is at your disposal. Depending on requirements, you can now move on to perform more advanced data massaging and manipulation.
As mentioned above, the Data Migration is by far not the only Task that can queued up to be run within a Job. By adding an SQL Statement you can immediately set up tasks to sculpt the data into structures directly relevant to your organisation’s requirements.
Keep an eye out for the next blog in this series on Snowflake data migration. Part 2 will cover what Data Governor Online can achieve regarding data staging and transformation.