Key Factors while migrating from ‘Redshift’ to ‘Snowflake’

ruchi khanuja
3 min readMay 31, 2021

Are you planning to migrate from ‘Redshift’ to ‘Snowflake’? In this post I will not focus on ‘WHY’ one should decide to migrate from ‘Redshift’ to ‘Snowflake’ rather I will discuss on high level ‘HOW’ one should migrate from Redshift to Snowflake.Here are top 5 things to consider as part of high level migration planning:

  1. Migration of DB Objects: This is about migrating DB, Schema, Table Structures, Views.Try to keep the object’s structure as-is while migrating.Create similar DB objects in Snowflake with same structure as it is on Redshift.You can automate this process via scripts which accesses Redshift metadata to get the DDL(object definition) of Redshift tables,views.One such metadata view on Redshift is “v_generate_tbl_ddl”. You can find this view definition in the github (https://github.com/awslabs/amazon-redshift-utils/blob/master/src/AdminViews/v_generate_tbl_ddl.sql).By using this view “v_generate_tbl_ddl’’, DDL of the Redshift tables can be retrieved.This DDL has to be then made compatible to run on Snowflake.
  2. Migration of Data: After the objects like DB, Schema,Tables are migrated to Snowflake, next step is to migrate the Data too.One option to migrate the data is to use Redshift’s “Unload command” to unload data into S3 and then use Snowflake’s “Copy command” to load this data from S3 into Snowflake tables.One might see some errors due to compatibility issues during this process.Another option is If you are using any data replication tool which can support Snowflake as a target,then raw data from source system can be migrated using the replication tool and loaded into snowflake.On top of this raw data one can run ETL to populate facts,dimensions,Metrics tables on Snowflake.
  3. Migration of Code: Redshift and Snowflake support different formats for various items for e.g. there is no concept of DISTKEY, SORTKEY, ENCODE in Snowflake. Difference in date functions like GETDATE() in Redshift is similar to CURRENT_TIMESTAMP() in Snowflake. Snowflake has “VARIANT” datatype for supporting semi structured data like JSON,AVRO,PARQUET, whereas in Redshift datatype like JSON can not be stored but rather just be parsed in SQL directly with Redshift JSON functions, some of them are mentioned below.One has to be careful while migrating code and change the code to supported SQL syntax while migrating code from Redshift to Snowflake.

IS_VALID_JSON

IS_VALID_JSON_ARRAY

JSON_ARRAY_LENGTH

JSON_EXTRACT_ARRAY_ELEMENT_TEXT

JSON_EXTRACT_PATH_TEXT

4. Data Comparison between Redshift & Snowflake: It is almost impossible and a massive task to manually compare data between Redshift and Snowflake.Here one can write python scripts on Jupyter notebook to connect to each DB and run some checks against both the DBs and compare results.Checks could be for e.g matching record counts of the tables between two dbs, compare some metrics on fact tables, match the number of objects in both dbs, Check for duplicates, match the datatype or structure of the tables/views between the two dbs. It is one time effort to write these python scripts on Jupyter but can serve you as automated test cases and can be run multiple times or can be scheduled to run as daily cron jobs until the matching results between the two DBs are satisfactory.

5. Cost Comparison between Redshift and Snowflake and the Switch: There are various features that snowflake offers to lower the cost of running warehouse on Snowflake.As Snowflake separates storage from compute, facilitates smooth auto-scaling, supports auto-suspend of warehouse to minimum 1 second, different types of cache, thus leaves very small room for underutilised servers/compute, you just pay for what you use without having to worry about downtime for scaling up or scaling out.One might be curious to compare the cost between running the warehouse on Redshift and Snowflake.Its a good idea to do a cost comparison between the two before the final switch from Redshift to Snowflake.

--

--