Back to Top

The blog continuum

Return

Perspective ILM announces the release of Data Governor Online.

Perspective ILM announces the release of Data Governor Online.

Processing your Big Data workflows just got a whole lot simpler with the introduction of Data Governor (DG) Online. The first Software as a Service (SaaS) product offering from Perspective ILM, DG Online has revolutionised traditional Data Management and Warehousing processes, giving you the ability to set up a cloud-based, Big Data Analytics workflow in a matter of hours.

Brand new features in DG Online facilitate a more streamlined implementation process, requiring only the installation of a lightweight DG Online processor referred to as an Agent.

Differing from previous, server versions of Data Governor, DG Online now supports a one : many relationship between the application and the Agent, allowing easier data warehousing on multiple servers and hosts, due to the Agent’s ability to be installed on one tenant but across multiple locations. Fast-to-install, low footprint processor, The Agent also runs cross-platform, so you can leverage data from sources running on both Windows and Linux. Data Governor is now able to provide Big Data connectivity such as Apache Hadoop as a connection type - a feature only possible thanks to the Agent’s Linux Support. Read more about the Data Governor Online Agent

New Features - Connections

Data Governor Online introduces a number of key features focused on expediting the execution of Big Data workflows. These include: Connection Types:

Both versions of Data Governor come with an extensive library of connectors, allowing to you to automatically connect to almost any data source without the need to write a single line of code. DG Online now also includes native connectors to:

  • Databricks
  • Google Cloud Storage Buckets
  • Amazon S3

These new connection types allow you expedite your Data Migration to cloud irrespective of where the data is sourced (cloud vs on-premises, structured vs unstructured), and orchestrate your end-to-end data processing workflow using a variety of different languages such as Python, SQL, PowerShell and more, from a single, serverless, console.

Apache Parquet File Support

Parquet, is an open source file format for Hadoop.  Storing data structures in a compressed, columnar format, Parquet is different to the traditional row based file formats such as CSV, becoming increasingly popular for Big Data workflows.  The file format attributes allow faster processing and querying times, and allows to lower storage costs for data files on any serverless platform including Azure Data Lake.

Perhaps the greatest benefit of Parquet file formats is in the natively embedded metadata. The ability to capture and store metadata is integral to the effective data governance and management of a Data Lake. Petabytes of uncatalogued data stored without any structure, regardless of whether the data structured or unstructured to begin with, will only lead to an increasingly expensive and useless Data Swamp.

 

DG Online supports Parquet file formats and is now as simple as adding the term “parquet” as your file extension on the “File Definition” tab when setting up your migration.

DG Online will automatically convert your source data to this format before migrating to your defined destination. 

Improved Connector Support

DG Online includes enhancements to its existing Connector support capabilities, focused on providing greater stability and improved user experience when sourcing disparate data.

This release now allows you to work with Dynamics 365, Xero, Sugar CRM and Google Analytics.

New Features – Task Types: 

The ability to centralise heterogenous task types, orchestrate the sequence in which each should be run depending on your end-to-end workflow and access a central console to view logs, alerts and errors from generated from each task drastically reduces the complexity of managing a Big Data Analytics ecosystem effectively. 

With DG Online you can now execute your existing, and new:

• Databricks jobs

• Notebook, Spark JAR and PySpark tasks

• Azure Data Factory Pipeline

• HPC workloads using Azure Batch clusters

• Azure Shipyard workloads

Creating Clusters

Cluster Definitions are designed to be flexible and dynamic, having common components across them no matter their requirements.

Think of Cluster Definitions as the blueprints Data Governor uses to create virtual machines.

Cluster Definitions are a means of defining instances of on-demand clusters and PCs that need to be created by Data Governor for the purposes of running a task. DG Online provides the ability to define clusters for individual Databricks jobs that require to scale up, on-demand to run. 

 

This feature allows you to define your minimum and maximum values for the “workers” used to run the process.  

 

What does this mean? Workers define the number of processing units your cluster has.  The higher the number, the more processing units will be used upon execution.  DG Online will only work within the limits that you set, auto-scaling as required.  It is important to note that the higher maximum value you set for your workers may result in higher costs by your cloud provider.

The native flexibility and adaptability of Data Governor Online has removed the number of moving parts required to perform Big Data analytics.  Purpose built to support a heterogenous analytics ecosystem, DG Online orchestrates, executes, logs and reports without needing a single line of code.

 

Getting Started:

For existing customers, it's as easy as clicking on the link below to take DG Online for a test drive.  Once you register, notify the Perspective team and you will be able to run DG Online concurrently with your existing Data Governor license for the duration of existing license agreement.

For new customers, simply Take DG Online for a test drive today!  You will have access to a feature complete instance of DG Online for a period of 60 days. For Free.

More Information:

Watch the video

Read the documentation