Professional Documents
Culture Documents
Mysql To Snowflake Migration Guide
Mysql To Snowflake Migration Guide
Mysql To Snowflake Migration Guide
What’s inside:
1. Why Migrate?
2. Strategy—thinking about your migration
3. Migrating your existing MySQL Database
4. Migrate Using Traditional Backup and Put/Copy Operations
1. Extract Data from MySQL
2. Data Types and Formatting
3. Stage Data Files
4. Copy Staged Files to Snowflake Table
5. Incremental Data Load
6. Increment Extract from MySQL
7. Update Snowflake Table
WHY SNOWFLAKE?
Snowflake’s innovations break down the technology and architecture barriers that
organizations still experience with other data warehouse vendors. Only Snowflake has
achieved all six of the defining qualities of a data warehouse built for the cloud:
➔ ZERO MANAGEMENT
Snowflake reduces complexity with built-in performance, so there’s no infrastructure
to tweak, no knobs to turn and no tuning required.
Create a single source of truth to easily store, integrate and extract critical insight
from petabytes of structured and semi-structured data (JSON, Avro, ORC, Parquet
or XML).
➔ DATA SHARING
Snowflake extends the data warehouse to the Data Sharehouse™, with direct,
governed and secure data sharing in real time, so enterprises can easily forge
one-to-one, one-to-many and many-to-many data sharing relationships.
Approach of Migrations -
The decision whether to move data and processes in one bulk operation or deploy a staged
approach depends on several factors.
1. Nature of your current data analytics platform
2. The types and number of data sources
3. Time to move the legacy system to Snowflake
In Our Case, We will take the dump of Tables/Databases and copying across the internet
into a pre-deployed target Snowflake account. Although this Lift and shift can be done
manually, the process can and should be automated with ETL Tools.
BENEFITS -
● Migrate fast to new system
● Reduced risk compared to replatforming and and refactoring
● Lower initial cost compared to replatforming and and refactoring
● Thanks to multiple cloud native and partner tools available, the process can be highly
automated with limited or no downtime.
RISKS -
To successfully migrate your enterprise database to Snowflake, develop and follow a logical
plan that includes the steps presented in this section.
1. MOVING YOUR DATA MODEL
1. Using a data modeling tool(MySQL WorkBench/ERWin)
2. Using existing DDL scripts
3. Creating new DDL scripts using mysqldump
mysqldump --no-data -u someuser -papples
mydatabase>db_name_ddl.sql
2. MOVING YOUR EXISTING DATA SET
1. Moving Data using ETL tool(FiveTran,Stitch etc)
2. Moving Data using Traditional Backup utilities(mysqldump) and Setting
Up CDC
You need only basic DDL, such as CREATE TABLE, CREATE VIEW and CREATE
SEQUENCE. Once you have these scripts, you can log into your Snowflake account to
execute them through the UI or the command line tool SnowSQL.
If you have a data modeling tool, but the model is not current, we recommend you reverse
engineer the current design into your tool, then follow the approach outlined above.
After building your objects in Snowflake, move the historical data loaded in your MySQL
system over to Snowflake.
Moving Data using ETL tool(FiveTran,Stitch,Alloma etc)
You can use a third-party migration tool (see Appendix A), an ETL tool or a manual process.
When choosing an option, consider how much data you have to move. For example, to
move 10s or 100s of terabytes up to a few petabytes of data, a practical approach is to
extract the data to files and move it via a service such as AWS Snowball or Azure Data Box.
If you have to move 100s of petabytes or even exabytes of data, AWS Snowmobile or Azure
Data Box are available options.
If you choose to move your data manually, you will need to extract the data for each table to
one or more delimited flat files in text format. Use one of the many methods available to the
MySQL database such as mysqldump,mydumper to pump the data out to the desired format.
Then upload these files using the PUT command into an Amazon S3 staging bucket, either
internal or external. We recommend these files be between 100MB and 1GB to take
advantage of Snowflake’s parallel bulk loading.
After you have extracted the data and moved it to S3, you can begin loading the data into
your table in Snowflake using the COPY command. You can check out more details about
our COPY command in our online documentation.
PROCEDURE TO MIGRATE THE DATABASE USING
TRADITIONAL BACKUP AND PUT/COPY
OPERATIONS STEP BY STEP.
The high-level steps to be followed for MySQL to Snowflake migration as shown in the figure
above are,
Broadly, there are two methods that are followed to extract from MySQL. One is using the
command line tool – mysqldump and the other is running SQL query using MySQL client and
saving the output to files.
Mysqldump is a client utility available by default with standard Mysql installation. Its main
usage is to create a logical backup of a database/table. It can be used to extract one table
as shown below:
mysqldump -u <username> -h <host_name> -p database_name my_table >
my_table_out.sql
Here, the output file table_name.sql will be in the form of insert statements like
To convert this format into a CSV file you have to write a small script or use some open
source library available. You can refer MySQL official documentation for more information.
If the mysqldump is running on the same machine or different machine where the mysql
server runs, you have another simpler option to get CSV directly. Use below command to get
CSV file:
SQL commands can be executed using MySQL client utility and redirect output to a file.
The output can be transformed using text editing utilities like sed or awk to clean and format
data.
Example:
TINYINT TINYINT
SMALLINT SMALLINT
MEDIUMINT INTEGER
INT INTEGER
BIGINT BIGINT
DECIMAL DECIMAL
BIT BOOLEAN
CHAR CHAR
VARCHAR VARCHAR
BINARY BINARY
VARBINARY VARBINARY
ENUM No type for ENUM. Must use any type which can
represent values in ENUM.
SET No type for SE. Must use any type which can
represent values in SET.
DATE DATE
TIME TIME
DATETIME DATETIME
TIMESTAMP TIMESTAMP
● Snowflake allows most of the date/time format and it can be explicitly specified
while loading data to table using File Format Option ( we will discuss this in detail
later). For the complete list of supported format please click here.
To insert MySQL data into a Snowflake table first data files needs to be uploaded to a
temporary location which is called staging. Snowflake support internal and external stages.
Internal Stage
Each user and table is automatically allocated an internal stage for staging data files. You
can also create named internal stages.
Internal Named Stages are explicitly created by the user using respective SQL statements. It
provides a greater degree of flexibility while loading data. You can assign file format and
other options to named stages which makes data load easier.
While working with Snowflake you will need to run a lot of DML and DDL statements in SQL
and some specific commands like for data load as shown below. SnowSQL is a very handy
CLI client which can be used to run those commands and is available in
Linux/Mac/Windows.
Example:
PUT command is used to stage data files to an internal stage. The syntax of the command is
as given below :
Example:
There are many useful options like set parallelism while uploading the file, automatic
compression of data files etc.
External Stage
Data to the external stage can be uploaded using respective cloud vendor interfaces. For S3
you can upload using web console or any SDK or third-party tools.
COPY INTO command is to load the contents of the staged file(s) into a Snowflake table.
This command needs to compute resources in the form of virtual warehouses to run.
Example:
To load from a named internal stage:
FROM @mysql_stage;
FROM @mysql_ext_stage/tutorials/dataloading/contacts1.csv;
pattern='.*/.*/.*[.]csv[.]gz';
Some common format options for CSV format supported in the COPY command are the
following:
After initial full data is loaded to the target table, most of the time changed data is extracted
from the source and migrated to the target table at a regular interval. Sometimes for small
tables, full data dump can be used even for recurring data migration but for the larger table
we have to go with delta approach.
To get only modified records after a particular time, run SQL with proper predicates against
the table and write output to file. mysqldump not useful here as it always extracts full data.
Any records deleted physically will be missing here and will not be reflected in the target.
Snowflake supports row-level updates which makes delta data migration much easier. Basic
idea is to load incrementally extracted data into an intermediate table and modify records in
final table as per data in the intermediate table.
We can choose three methods to modify the final table once data is loaded into the
intermediate table.
● Update the existing rows in the final table and insert new rows from the
intermediate table which are not in the final table.
UPDATE final_table t
SET t.value = s.value
FROM intermed_table in
● Delete all rows from the final table which are present in the intermediate
table. Then insert all rows from the intermediate table to the final table.
DELETE .final_table f
FROM intermed_table;
● MERGE statement – Insert and update can be done with a single MERGE
statement which can be used to apply changes in the intermediate table to
the final table.
WHEN not matched then INSERT (id, value) values (t2.id, t2.value);
Since Snowflake uses ANSI-compliant SQL, most of your existing queries will execute on
Snowflake without requiring change. However, MySQL uses some MySQL-specific
extensions, so you need to watch out for a few constructs. Some examples include the use
of FETCH FIRST x ROWS ONLY. See Appendix C for details and suggested translations.
Another common change relates to formatting of date constants used for comparisons in
predicates. For example:
In MySQL it looks like this:
Or
Or
Migrating BI tools
Many of your queries and reports are likely to use an existing business intelligence (BI) tool.
Therefore, you’ll need to account for migrating those connections from MySQL to Snowflake.
You’ll also have to test those queries and reports to be sure you’re getting the expected
results.
This should not be difficult since Snowflake supports standard ODBC and JDBC
connectivity, which most modern BI tools use. Many of the mainstream tools have native
connectors to Snowflake. Check our website to see if your tools are part of our ecosystem.
Don’t worry if your tool of choice is not listed. You should be able to establish a connection
using either ODBC or JDBC. If you have questions about a specific tool, your Snowflake
contact will be happy to help.