Unstructured data storage grows more popular as storage costs decrease and become less of a constraining factor. Rather than be limited by the schema and specific rules of structured databases, users are finding it easier to store data in blob form and later retrieve the data they need in the required format . This post will cover unstructured data in a data blob, Azure Data Lake Storage Gen2 and how it can be used in Data Governor Online.
A blob is the standard way that data is stored in an unstructured manner. Cloud platforms allow blobs to be stored in collections known as buckets or containers. This is similar to files that are organised into folders, the file is the blob and folders are the buckets and containers. When a large set of blobs is structured in a meaningful and usable manner, a data lake is created. These blobs can be retrieved from their data lakes and processed into a format that is ready for use. Popular blob storage providers are Azure Blob Storage, Amazon S3 Buckets, and Google Cloud Platform Buckets.
Azure Data Lake Storage Gen2
Azure Data Lake Storage Gen2 (ADLS Gen2), builds on Azure Blob Storage, adding features to accommodate data lakes and big data. These features prevent problems sometimes caused due to a lack of structure and validation found in conventional data warehouses. ADLS Gen2 also aims to simplify Data Lake creation by distributing data into folders. This feature is called a hierarchical namespace and it organises your files into hierarchical directories in order to access them easily. Data Governor Online supports the hierarchical file systems of ADLS Gen2, greatly simplifying directory management.
Data Governor Online is a Software as a Service (SaaS) data orchestration tool that allows for the migration and transformation of data through an uncomplicated web-based interface. It makes it easy to create data migrations into Azure Blob and ADLS Gen2 from any of the supported data sources. You can also schedule and automate these data migrations to regularly run at a time that will best suit you and your organisation.
Data Governor Online also supports Incremental Loading during Data Migrations. It provides a way to conditionally pull in only new data rather than the entire table/source.
Data Governor Online will append new data to the end of any blobs stored in ADLS Gen2, as opposed to dropping the target file and recreating it. This supports all source connection types that support incremental loading.
Working with blobs in Data Governor Online
To use a blob as a target in Data Governor Online you will need two things, a Blob based connection and Blob based File Definition. In this case it would be the desired container and directory to use in your ADLS Gen2 connection.
Importing Data into Azure Data Lake Gen2
- Create an Azure Data Lake Storage Gen2 connection. The technical details on how to configure an ADLS Gen2 connection can be found here.
2. Create a File Definition for use with the ADLS Gen2 connection. In this demonstration we will save data as CSV files.
3. Create a data migration task with the ADLS Gen2 connection as the target and our new file definition as the target file definition. Data Governor Online supports migrating into ADLS Gen2 from any of our available source connections. However, for this demo, we will import data from a SQL Server database.
4. Select the tables you would like to import. In this example we have selected three.
5. Run the migration. If your job was successful, you can view your imported tables in Azure Storage Explorer. A CSV file has been created for each table selected.