Data Virtualization using DENODO and Snowflake

By April 30, 2020 Blogs

Every enterprise strives to become data-driven, for which it is essential to maintain an efficient turnaround time for analysis and data consolidation or accessibility. This gave rise to two essential concepts – EDW (Enterprise Data Warehouse) and Data Virtualization.

An Enterprise data warehouse is often referred to as a database or a central repository of databases which facilitates hassle-free analytics tasks. Popular EDW solutions today include AWS RedShift, Teradata, Azure SQL DW and Snowflake cloud data platform among others.

Data Virtualization integrates data siloed across disparate sources into a logical layer. It facilitates, data management, governance and data access. Atscale and Denodo two of the leading data virtualization technologies today.

According to a survey completed by Harvard Business Review, 38% of senior executives across industries say “organizational data silos are their biggest hurdle in digital transformation”. Hence making data availability or accessibility from disparate sources – the primary objective of an organization.

In the purview of this article, we discuss an approach to achieving a single source of truth using Denodo and Snowflake.

Denodo provides a data virtualization platform providing data access, availability, governance and delivery capabilities from multiple data sources without moving the data.

Denodo interacts with different data sources using virtualization and helps us visualise large heterogeneous datasets:

Source

Snowflake‘s initial offering was a cloud data warehouse which separated compute, storage and cloud services in different layers. However, now it has evolved into a cloud data platform by recently launching their Data Exchange – by allowing enterprises to use external data in realtime.

In pursuit of achieving a single source of truth, organizations have tried and tested a lot of approaches over time, among others, one of the most commonly adopted approaches has been creating a data lake to have a central repository of raw data.

Data virtualization is a logical data layer that integrates all enterprise data silos across disparate systems and manages unified data for centralized security & governance delivering to business users in real-time.

Why connect snowflake with Denodo, Here we have a few use cases for you

  1. To integrate real-time data from sources like Salesforce or SAP with the data in the Snowflake Cloud Data Warehouse
  2. To make data available for Business Analysts without actually exposing Snowflake data lake or integrities of the snowflake using DENODO’s Data Catalog service
  3. To integrate data in Snowflake cloud data warehouse with existing data sources & systems (now yet migrated to Snowflake) for reporting and BI

Three stages in Denodo:

  1. CONNECT – Easily connect to a host of different data sources like Salesforce, APIs, Excel, RDBMS etc.
  2. COMBINE:
    • Metadata of source tables/ views are imported as physical views
    • Physical Views are joined, transformed and formula column added
    • The column selected from logical views and filters added to create BI Datasets.
  3. CONSUME: It Consumes the data in any mode like tableau reports, CSV,HTML,etc

Connecting Snowflake with Denodo:

Step 1:  Add Snowflake data source using JDBC driver

Step 2 :  Connection Details

Step 3:  Test Connection

Post connection established successfully, click on “Save” button on top.

Step 4: Create a base view on snowflake data

  1. Once the connection is created then click on  “Create base view”
  2. You can see a list of all schema/DBs available in the connected network
  3. Select the tables you want to create the base view on and then click on “create selected”

Step 5: Accessing the view “customer_address”

  1. Click on the view name
  2. Click on the execute icon
  3. Add any conditions if you want and again click the execute button to check the output

Business Users today want ready access and an option to self-explore. For that Denodo’s Data Catalog comes in handy.

Data Catalog is distributed as a web application with  Denodo 7.0. Data Catalog comes with a tableau extension and allows analysts and business users to perform self-service exploration & analytics.

Step1: Click on the start icon of the data catalogue and launch the service
Step 2: Post login always follow below steps to sync data on VDP(virtual data port) with your Catalog

Step 3.1:  Go to the “Browse” tab and select “Databases/Folder”

  1. Select the table which you want to query. You can see the list of all available columns from that table
  2. Drag and drop the columns you want to query in “Output Columns” section
  3. Click on the “Run” icon to check the output

Step 3.2: If Tableau is already installed on your local machine then you can see the output directly in Tableau by selecting export option “Tableau”

Step 4:  Use the “Search” feature of data catalogue if you are not familiar with the tables available in your databases or if you want to check all tables are available

For example:

  1. Enter “Customer” keyword and click on the search icon
  2. You will get a list of Categories/Tags/Databases/Web Services having Customer keyword present in it

Would like to hear about your experience of data virtualization using Denodo or any other virtualization tools with Snowflake.

Pallavi Aher

Pallavi Aher

4.9+ years of experience in Big data, Cloud, NoSQL, scala, Snowflake, Data Engineering, Denodo

One Comment

Leave a Reply