Make Batch Loads easy and fast with Snowflake’s SWAP Command
Batch loading of data in ETL is usually done overnight so Business can use the processed data early in the day.Often for any correction in the data, business has to wait for the next overnight load to happen, which is usually in 24 hours.
In today’s world where data is an asset, the more up-to-date Data is the more relevant it is for business in order to make business critical decisions.Often this batch load in ETL is a heavy and lengthy process and can’t be run during the daytime when the resultant dataset is in-use by the end user or business.Snowflake provides a very useful feature to handle such scenario called ‘SWAP WITH’. Here is how you can use SWAP WITH in ETL and run Batch ETL job anytime and multiple time during the day:
- Create a TEMP table with same structure as the final table that is loaded by the ETL process.
- You can do this by either using Snowflake’s Clone command; apart from creating the cloned table this will also copy the data without using any Snowflake’s computation power or use command create table <tmp_table> as select * from <main_table> where 1=0 , this will duplicate table structure but without data from the main table.
- Use the TEMP table as a target in the ETL process, load it with processed data by running ETL jobs.
- If you wish to run some quality checks , do that on the TEMP table after it is loaded in step 3. Set limit for your data quality checks , if the checks do not meet the data quality criteria stop/fail the ETL at this stage, correct the data and rerun the ETL as said in step 3.
- If quality checks are fine, then at the last step swap this TEMP table with the actual main_table using Snowflake’s SWAP WITH command.This SWAP with command works on the metadata of these two tables and in a nutshell renames them.This command is very fast, gets executed with milliseconds.After the SWAP main_table which is used by the business or your reporting tools is refreshed with latest data within a single transactions and in milliseconds.
By using this method there is No/very less impact on the reporting tool/dashboards that uses data generated by ETL. SWAP WITH command makes sure that resultant data does not go missing for hours when the ETL is still running, refreshed data is loaded into final table within seconds. This is how you can run ETL Batch jobs multiple times during the day and also do quality control of the data without impacting business users or reporting tools.