I have been getting a number of requests from customers, colleagues and partners on what to consider when moving from AWS Redshift to Azure SQL Data Warehouse.
This post tries to cover all the basics that need to be taken care of while moving from Redshift to Azure SQL Data Warehouse using Azure Data Factory. For a more detailed step by step guide refer here.
To start, let’s look at the prerequisites that need to be in place before starting the migration:
- AWS key id and AWS access key, to access the database or table that need to be migrated from customer.
- Azure Blob storage (for staging)
- AWS S3 Bucket (To unload data from AWS Redshift)
- Azure SQL Data Warehouse (Data Storage and analytics)
- Azure Data Factory (Orchestration)
Once the prerequisites have been taken into consideration, you need to start the migration process. Initiate the process by following steps:
Configure Azure Data Factory parameters – The parameters that need to be configured depending on various requirements are Datasets, Connections and Pipeline.
Create Azure Blob storage – Fill the basics, advanced and tag information for creating Azure Blob storage.
Create S3 bucket on AWS portal – You will need to keep the access key id, secret key as well as the service link handy
Create Azure Data Warehouse by creating a SQL server to work on the backend of a SQL data warehouse.
Create a New Server and ensure to allow azure services to access server
Create Azure data factory and select Author and Monitor to configure and create pipeline.
Click on Copy Data to create the transaction
New Connections – Creating, configuring and testing new connections
Table mapping – This is done between source and destination table while ensuring that that the schema of source dataset is matching with the schema of destination dataset.
Configure Data movement by staging blob storage and unloading S3 bucket
Linking connection – Linking of both these connections need to be done in the same way as AWS redshift and Azure Data warehouse, only the filling parameters have to be changed as per the earlier configuration
Testing the connection – Test the connection for both staging and unloading to ensure a smooth lift and shift migration.
Finishing the deployment – Click on finishing the deployment after checking the summary tab.
Verifying output – To check the status of the deployment, go to the monitor tab. The Data Warehouse tab allows you to verify the output.
You might face problems while creating linking with AWS Redshift in Data Factory. To troubleshoot this issue, remember to check the Redshift Cluster, click on ‘Modify cluster’ and check if VPC is assigned to the cluster. Also, check other such parameters or review the error log to modify recommended changes.