In our last blog post we demonstrated how you can import data from Google BigQuery, and you can read more about the way you can use Data Governor Online for your BigQuery data migrations here.
Google BigQuery is a serverless, highly scalable data warehouse that is entirely cloud hosted. Data Governor Online is a data orchestration tool where you can make use of our numerous connection types, which includes Google BigQuery, to perform end-to-end ELT.
This post will go over how you can migrate data into Google BigQuery using a Data Migration task in Data Governor Online. For the purpose of this article we will be reusing the Google BigQuery connection we created in our last post.
How to migrate data using the Google BigQuery Connector in Data Governor Online
One point of difference when migrating data into BigQuery compared to when migrating data out of Google BigQuery is that you need to use a staging connection. A staging connection is a blob connection that is read into Google BigQuery using BigQuery’s built in data sourcing tools. We support all Data Governor Online Source Connections for Google BigQuery but require a Google Cloud Bucket as a staging connection, ensuring maximum performance and stability for BigQuery migrations.
Creating a Google Cloud Bucket Staging Connection
A detailed guide to creating a Google Cloud Bucket Connection is found here.
For a Google Cloud Storage Bucket Connection, you will need to create a GCP Service Account and retrieve a Credential File. You can learn more about creating a credential file here.
Once you have your credential file, in your bucket overview you can find the URL that is required from the field ‘Link for gsutil’.
After creating your credential file, keep it in the keys directory (in both Windows and Linux). This ensures that the Agent will always be able to retrieve the credential file.
Something to keep in mind is that table names that include special characters such as ‘$’ and ‘.’ may not work with Google Cloud Storage Buckets as they cannot be safely expressed as a URI on Google Cloud Platform.
With our new Google Cloud connection, we also need a File Definition to define where and how the data is staged. A File Definition sets out constraints and settings that are used when processing data in a file format. This includes settings such as the location, type and format of the files. For our File Definition, we will use the Parquet file format and we will store files in a directory called ‘article’.
Creating a Data Migration Task
After you have prepared your staging connection, add a new Project and a new Job for the Google BigQuery tasks you will create such as data migration and SQL Statement tasks.
As mentioned earlier, we will use a data migration task to import data into Google BigQuery. In the first screen, name your task, select your agent, and select data migration as your task.
Selecting a Connector
In the next window in each drop-down list you will find all your created connections. Select your source connection and Google BigQuery as your target connection as well as your source and target schemas. If you wish to create a new schema as part of the migration, check the ‘Custom Output Schema’ box and provide a name.
Once you select Google BigQuery as a target you will be prompted for a Staging Connection and File Definition. We will be using the newly created Bucket Connection and Blob File Definition.
Choosing your Migration Data
In the next window, depending on your data, you have the option to migrate tables, views and queries.
We will select our three source tables and they will appear in the target section beneath.
Once submitted, we can choose to run it immediately or on a schedule.
In the following image, the tables we have chosen to migrate shown in Microsoft SQL Server Management Studio.