Loading JSON data into snowflake by running copy command.Here the column type of the table where JSON data will be loaded is of VARIANT type.

copy into <table name> from @stagename file_format = (type=json);

For querying the JSON data either parse through JSON arrays or use FLATTEN function as shown below:

select <table_column_name>:<json_element_name> 

select <table_column_name>:<json_element_name>::<casting_data_type>

select value:<repeating_json_element>::<casting_data_type>
table (flatten( <variant_column_name>,<json_element_name_tobe_flattened> )) ;

Using GET_PATH extract value of JSON data by providing the path name.

select get_path( <variant_column_name>,<root_json_element>.<repeating_json_element>) 

Are you planning to migrate from ‘Redshift’ to ‘Snowflake’? In this post I will not focus on ‘WHY’ one should decide to migrate from ‘Redshift’ to ‘Snowflake’ rather I will discuss on high level ‘HOW’ one should migrate from Redshift to Snowflake.Here are top 5 things to consider as part of high level migration planning:

  1. Migration of DB Objects: This is about migrating DB, Schema, Table Structures, Views.Try to keep the object’s structure as-is while migrating.Create similar DB objects in Snowflake with same structure as it is on Redshift.You can automate this process via scripts which accesses Redshift metadata to…

  1. Redshift has a Columnar storage.In columnar storage data is stored by columns and not by rows.
  2. It has MPP architecture, with Massive Parallel Processing(MPP) Redshift can execute a query horizontally on multiple compute nodes.
  3. With Redshift one can query data stored in Redshift but also data residing in AWS S3 data lake using Redshift Spectrum.AWS S3 and Redshift cluster must be in same aws region.
  4. Redshift has a copy command to copy data from AWS S3 to the Redshift table and the syntax is :- copy <table name> from ‘s3://bucketname/subfolder/filename’ iam_role ‘<redshiftiamrolearn>’ OR copy <table name> from ‘s3://bucketname/subfolder/filename’ access_key_id ‘<aws…

Batch loading of data in ETL is usually done overnight so Business can use the processed data early in the day.Often for any correction in the data, business has to wait for the next overnight load to happen, which is usually in 24 hours.

In today’s world where data is an asset, the more up-to-date Data is the more relevant it is for business in order to make business critical decisions.Often this batch load in ETL is a heavy and lengthy process and can’t be run during the daytime when the resultant dataset is in-use by the end user or…

I have been working with Data Ingestion, both with near real time and batch loading. Today, I am sharing here one such method for loading of streaming data into #snowflake via #snowpipe.

Here are 7 simple steps you can follow to do the setup:

  1. Create a snowflake external stage pointing to the #aws #s3 bucket using aws keys, this bucket is where streaming data lands so, this bucket acts as a lake of data.
  2. Create a Snowpipe in Snowflake with property auto_ingest as TRUE, while creating snowpipe in the copy command specify the snowflake table where this streaming data should…

I spent some time working with Apache #Kafka and also did a small poc on #Kinesis. Often I am asked which one to prefer.Well it totally depends on the cost involved,volume,and kind of human resources available for support.Here is my take on Apache Kafka vs Kinesis.

First difference is the obvious one Apache Kafka is open source and Kinesis is #aws service that you have to pay for as per usage.They both have different terms for the similar functions.What is partition in Kafka is called Shards in Kinesis. Both are distributed platform for messaging.

At the high level,Setting up Kafka…

I have been working with AWS Database Migration service for a long time.It gets challenging to monitor the replication tasks running in AWS Database Migration service specially if there are many.Here is an easy way to setup active monitoring of AWS Database Migration service(DMS) on slack.Follow these simple steps to setup the same:

  1. Login to AWS console- > go to service ‘DMS’ ( Database Migration service)
  2. Click on Event Subscriptions →Give a name to the subscription, and as a target of this event subscription select any existing SNS Topic from the drop down or create a new one.
  3. Select “replication-task”…

Here are few steps to help you query raw data on S3 using AWS Athena:

  1. Login into AWS console-> go to services and select Athena.
  2. Under the Data Source-> default is “AWS DataCatalog”, leave it as is.
  3. Under the Database -> default database is “default”, you can select this database or create a database for your usage by running command “create database <databasename>”
  4. Make sure the under database-> the database created in step 3 is selected now.
  5. Next a new table can be created under this database.In …

Are you trying to setup ETL using AWS Glue?AWS Glue is ETL service provided by AWS.It is serverless, so you just pay for the resources used and not for the infra.Before you start with AWS Glue here are some of components that one should be aware of in order to setup ETL process using AWS Glue:

  1. Data Store → Source of data to be used in the Glue ETL process for e.g. AWS S3
  2. Crawler → Glue Crawler fetches schema information, other metadata from the various data sources.
  3. Classifiers → Recognizes the format of the data for e.g. …

ruchi khanuja

Data Engineer, BI, Data Analytics,DWH

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store