Saikiran Data - Engineer Resume

Sai Kiran.
G
Kiran.gattineni8@gmail.com
+1 (913) 732-1791
PROFESSIONAL SUMMARY
• Over 8 years of professional experience in IT, working with various Legacy Database systems,
which include work experience in Big Data technologies as well.
• Experienced with Big Data link technologies such as GCP, Amazon Warehouse Services
(AWS), Microsoft Azure, Cassandra, HIVE, No-SQL databases (like HBase, MongoDB),
SQL databases (like Oracle, SQL, Postgres SQL, My SQL server, Snowflake).
• Hands-on Experience with Amazon Web Services (Amazon EC2, Amazon S3, Amazon RDS,
Amazon Elastic Load Balancing, Amazon SQS, AWS Identity and access management,
Amazon SNS, Cloud Watch, Amazon Elastic Block Store (EBS), Amazon CloudFront, VPC,
DynamoDB, Lambda, Redshift, and other services of AWS.
• Developed and deployed various Lambda functions in AWS with in-built AWS Lambda
Libraries.
• Experience in analyzing data using Big Data Ecosystem including HDFS, Hive, HBase,
Zookeeper, PIG, Sqoop, and Flume.
• Strong experience on migrating other data bases to Snowflake.
• Knowledge and working experience on big data tools like Hadoop, Azure Data Lake, AWS
Redshift
• Good experience of software development in Python (libraries Beautiful Soup, NumPy,
SciPy, Panda’s data frame, Matplotlib, network, urllib2, MySQL dB for database
connectivity) and IDEs - sublime text, Spyder, PyCharm, Visual Studio Code
• Job Workflow Scheduling and monitoring using tools like Apache Airflow, Oozie, Corn and
IBM Tivoli.
• In depth Knowledge of Snowflake Database, Schema and table structure.
• Good Experience in Cloudera, Hortonworks, Apache Hadoop Distribution.
• Experience in workflow scheduling with Airflow, AWS Data Pipelines, Azure, SSIS, etc.
• Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics,
Azure SQL Database, Data Bricks and Azure SQL Data warehouse and controlling and
granting database access and Migrating On premise databases to Azure Data Lake store using
Azure Data factory.
• Acumen on Data Migration from Relational Database to Hadoop Platform using Sqoop.
• Experience in writing Sub Queries, Stored Procedures, Triggers, Cursors, and Functions on
SQL, SQLite and PostgreSQL database.
• Good understanding of Big Data Hadoop and Yarn architecture along with various Hadoop
Demons such as Job Tracker, Task Tracker, Name Node, Data Node, Resource/Cluster
Manager, and Kafka (distributed stream-processing).
• Experience in Text Analytics, Data Mining solutions to various business problems and
generating data visualizations using Python.
• Experience in Developing Spark applications using Spark - SQL in Databricks for data
extraction, transformation, and aggregation from multiple file formats for analyzing &
transforming the data to uncover insights into the customer usage patterns.
• Good understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames,
Spark Streaming, Driver Node, Worker Node, Stages, Executors and Tasks.
• Strong experience and knowledge of NoSQL databases such as MongoDB and Cassandra.
• Experience in development and support knowledge on Oracle, SQL, PL/SQL.
• Solid Excellent experience in creating cloud-based solutions and architecture using Amazon
Web services (Amazon EC2, Amazon S3, Amazon RDS) and Microsoft Azure.
Technical Skills:
Big Data Ecosystem HDFS, MapReduce, Pyspark, Hive, Airflow, Sqoop, HBase.
Hadoop Distributions Microsoft Azure - Databricks, Data Lake, Blob Storage, Azure Data
Factory, SQL Database, SQL Data Warehouse.
Amazon AWS - EMR, EC2, EBS, RDS, S3, Athena, Glue,
Elasticsearch, Lambda, SQS, DynamoDB, Redshift, ECS.
Apache Hadoop 2.x/1.x
Scripting Languages Python, JavaScript, R, PowerShell Scripting, HiveQL,perl
Cloud Environment Amazon Web Services (AWS), Microsoft Azure.
No SQL Database DynamoDB, HBase
Database MySQL, Oracle, Teradata, MS SQL SERVER.

ETL/BI Snowflake, SSIS, Power BI
Operating Systems Linux (Ubuntu, Centos, RedHat), Windows,Unix
Version Control Git, Bitbucket
Others Jupyter Notebook, Kubernetes, Jenkins, Jira
Work Experience:
Client: Univar Solutions

Location: The woodlands, Tx
Role: Data Engineer Dec2021-Till now
Responsibilities:
 Collaborating with the business for requirement gathering for data warehouse & reporting.
 Extracting, transforming, and loading data from different sources to Azure Data Storage Services
using Azure data factory, t-SQL to perform data lake analytics.
 Working on data transformations for ML OPs - adding calculated columns, managing
relationships, creating different measures, merging & appending queries, replacing values, split
columns, grouping by, Date & Time Column.
 Data Ingestion to Azure Services - Azure Data Lake, Azure Storage, Azure SQL, Azure DW, and
processing the data in Azure Databricks.
 Creating batches and sessions to move data at specific intervals and on-demand using Server
Manager.
 Blended multiple data connections and created multiple joins across the various data sources for
data preparation.
 Extracted data from Data Lakes, EDW to relational databases for analyzing and getting more
meaningful insights using SQL Queries and PySpark.
Developed PL/SQL scripts to extract data from multiple data sources and transform them into a
format that can be easily analyzed.
 Developing Python scripts to do file validations in Databricks and automated the process using
ADF.
 For data processing developed JSON Scripts for deploying the Pipeline in Azure Data Factory
(ADF) using the SQL activity.
 Supported production data pipelines including performance tuning and troubleshooting of SQL,
Spark, and Python scripts.
 Developing audit, balance, and control framework using SQL DB audit tables to control the
ingestion, transformation, and load process in Azure.
 Creating tables in Azure SQL DW for data reporting and visualization for business requirements.
 Creating visualization reports, dashboards, and KPI scorecards using Power BI desktop.
 Designing, developing, and deploying ETL solutions using SQL Server Integration Services
(SSIS).
 Connecting various applications to the existing database, and create databases, and schema objects
including indexes and tables by writing various functions, stored procedures, and triggers.
 For Query optimization and fast query retrieval performed Normalization and De-Normalization
of existing tables, with the effective use of Joins & indexes.
 Creating alerts on data integration events (success/failure) and monitored them.
 Collaborating with product managers, scrum masters, and engineers to develop Agile practices
and documentation initiatives to bring experience for retrospectives, backlog, and meetings.
Client: Country Financial

Location: Bloomington, Illinois
Role: Sr. Spark Developer Sep2020-Dec2021
Responsibilities:
 Designed a data pipeline to automate the ingestion, processing, and delivery by processing
batch and streaming data using Spark, AWS EMR Clusters, Lambda, and Databricks.
 Developed Airflow automation and Python scripts for batch data processing, ETL, and data
warehouse ingestion using AWS Lambda Python functions, Elastic Kubernetes Service (EKS),
and S3.
 Data ingestion into a data lake(S3) and used AWS Glue to expose the data to Redshift.
 Configured EMR cluster for data ingestion and used dbt (data build tool) to transform the data in
Redshift.
 Run batch processing to calculate the risk associated and to generate several feeds to other
systems such as Discounted cash flow (DCF), PNL, and Europe credit platform for Pricing
strategy.
 Wrote & tested SQL code for transformations using the data build tool.
 Designed and developed a data architecture to load data from AWS S3 to Snowflake via Airflow by
creating DAGs and processed data for Data Visualization Tools.
 Worked on creating data pipelines with Airflow to schedule AWS jobs for performing incremental
loads and used Flume for weblog server data.
 Scheduled Airflow jobs to automate the ingestion process into the data lake using Apache Airflow
in a cluster.
 Evaluate snowflake Design considerations for any change in the application.
 Developed PL/SQL procedure to load data into a data warehouse.
 Wrote Python scripts and used Airflow DAGs to automate the process of extracting weblogs.
 Developed and implemented Hive Bucketing and Partitioning.
 Loaded data into S3 buckets using AWS Glue and PySpark. Snowflake Involved in filtering data
stored in S3 buckets using Elasticsearch and loaded data into Hive external tables.
 Worked on financial spreading by developing scalable applications for real-time ingestions into
various databases using AWS Kinesis and performed necessary transformations and aggregation to
build the common learner data model and store the data in HBase.
 Orchestrated multiple ETL jobs using AWS step functions and Lambda and used AWS Glue to load
and prepare data Analytics for customers.
 Worked on AWS Lambda to run servers without managing them and to trigger run code by S3 and
SNS.
 Developed data transition programs from DynamoDB to AWS Redshift (ETL Process) using AWS
Lambda by creating functions in Python for certain events based on use cases.
 Implemented the AWS cloud computing platform by using RDS, Python, DynamoDB, S3, and
Redshift.
 Worked in Developing Spark applications using Spark - SQL in Databricks for data extraction,
transformation, and aggregation from multiple file formats for analyzing & transforming the data
to uncover insights into the customer usage patterns.
 Worked with various formats of files like delimited text files, clickstream log files, Apache log
files, Avro files, JSON files, and XML Files.
 Mastered in using different columnar file formats like RC, ORC, and Parquet formats.
 Performed Database activities such as Indexing, and performance tuning.
 Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performed
necessary transformations on the fly to build the common learner data model.
 Responsible for loading and transforming huge sets of structured, semi-structured, and unstructured
data.
 Used AWS EMR clusters for creating Hadoop and spark clusters and these clusters are used for
submitting and executing Python applications in production.
 Designed and developed end-to-end ETL processing from Oracle to AWS using Amazon S3, EMR,
and Spark.
 Worked on CI/CD solution, using Git, Jenkins, Docker, and Kubernetes to set up and
configure big data architecture on AWS cloud platform.
 Written SQL Scripts and PL/SQL Scripts to extract data from the database to meet business
requirements and for Testing Purposes.
Client: Intuit
Location: Plano, Texas
Role: Bigdata developer Sep2018-Aug2020
Responsibilities:
 Written Map-Reduce code to process all the log files with rules defined in HDFS (as log files
generated by different devices have different xml rules).
 Involved in porting the existing on-premises Hive code migration to GCP (Google Cloud
Platform) Big Query.
 Involved in migration an Oracle SQL ETL to run on Google cloud platform using cloud Data
processing & Big Query, cloud pub/sub for triggering the Apache Airflow jobs.
 Developed and designed application to process data using Spark.
 Developed MapReduce jobs, Hive & PIG scripts for Data warehouse migration project.
 Developed stored procedures/views in snowflake and use in Talend for loading Dimensions and
Facts.
 Developed and designed system to collect data from multiple portals using Kafka and then
process it using spark.
 Developing MapReduce jobs, Hive & PIG scripts for Risk & Fraud Analytics platform.
 Developed Data ingestion platform using Sqoop and Flume to ingest Twitter and Facebook data
for Marketing & Offers platform.
 Developed and designed automate process using shell scripting for data movement and purging.
 Developed programs in JAVA, Scala-Spark for data reformation after extraction from HDFS
for analysis.
 Developed ETL pipelines in and out of data warehouse using combination of Python and
Snowflake’s Snow SQL.
 Participated in the development improvement and maintenance of snowflake database application.
 Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective
querying on the log data.
 Importing and exporting data into Impala, HDFS and Hive using Sqoop.
 Responsible to manage data coming from different sources.
 Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
 Developed Hive tables to transform, analyze the data in HDFS.
 Involved in creating Hive tables, loading with data and writing hive queries which will run
internally in map way.
 Developed Simple to Complex Map Reduce Jobs using Hive and Pig.
 Involved in running Hadoop Jobs for processing millions of records of text data.
 Developed the application by using the Struts framework.
 Created connection through JDBC and used JDBC statements to call stored procedures.
 Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
 Developed the Pig UDF’S to pre-process the data for analysis.
 Implemented multiple Map Reduce Jobs in java for data cleansing and pre-processing.
 Moved all RDBMS data into flat files generated from various channels to HDFS for further
processing.
 Developed job workflows in Oozie to automate the tasks of loading the data into HDFS.
 Handled importing of data from various data sources, performed transformations using Hive,
MapReduce, loaded data into HDFS and extracted data from Teradata into HDFS using Sqoop.
 Writing the script files for processing data and loading to HDFS.
Client: Novartis
Location: Parsippany, New Jersey
Role: Hadoop Developer July2017-Sep2018
Responsibilities:
 Writing the script files for processing data and loading to HDFS.
 Processed data into HDFS by developing solutions.
 Analyzed the data using Map Reduce, Pig, Hive and produce summary results from Hadoop to downstream
systems.
 Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store
in HDFS.
 Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing
the data onto HDFS.
 Build pipelines using Unix to connect different tools and it is used to extract data from a
database ,transform the data and load the data into a data warehouse.
 Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
 Exported the analyzed data to the relational database MySQL using Sqoop for visualization and to
generate reports.
 Created HBase tables to load large sets of structured data.
 Managed and reviewed Hadoop log files.
 Involved in providing inputs for estimate preparation for the new proposal.
 Worked extensively with HIVE DDLs and Hive Query language (HQLs).
 Developed UDF, UDAF, UDTF functions and implemented it in HIVE Queries.
 Implemented SQOOP for large dataset transfer between Hadoop and RDBMs.
 Created Map Reduce Jobs to convert the periodic of XML messages into a partition Avro Data.
 Used Sqoop widely to import data from various systems/sources (like MySQL) into HDFS.
 Created components like Hive UDFs for missing functionality in HIVE for analytics.
 Used different file formats like Text files, Sequence Files, Avro.
 Cluster co-ordination services through Zookeeper.
 Assisted in creating and maintaining technical documentation to launching HADOOP Clusters and
even for executing Hive queries and Pig Scripts.
 Assisted in Cluster maintenance, cluster monitoring, adding, and removing cluster nodes and
Trouble shooting.
 Installed and configured Hadoop, Map Reduce, HDFS, developed multiple Map Reduce jobs in java
for data cleaning and pre-processing.
Client: Concentrix technology

Location: Bangalore, India
Role: Data Analyst may2015-jun2017
Responsibilities:
 Designed & built reports, processes, and analyses with a variety of business intelligence tools &
Technologies.
 Transformed data into meaningful insights from various data sources to support the development of
global strategy and initiatives.
 Involved in requirements gathering, source data analysis, identified business rules for data
migration, and for developing data warehouse/data mart.
 Collected data using SQL Script, created reports using SSRS and used Tableau for data visualization
and custom reports analysis.
 Created reports in tab
 Performed Exploratory Data analysis (EDA) to find and understand interactions between different
fields in the dataset, handling missing values, detecting outliers, data distribution, and extracting
important variables graphically.
 Worked on python library - NumPy, Pandas, SciPy for data wrangling and analysis, while
visualization libraries of Python using Matplotlib for graphs plotting.
 Performed data collection, cleaning, wrangling, analysis, and building machine learning models on
the data sets in both R and Python.
 Used Agile methodologies to emphasize face-to-face communication and that iteration are passing
through full SDLC.

Saikiran Data - Engineer Resume

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Saikiran Data - Engineer Resume

Uploaded by

Copyright:

Available Formats

Sai Kiran.

Database MySQL, Oracle, Teradata, MS SQL SERVER.

Client: Univar Solutions

Client: Country Financial

Client: Concentrix technology

You might also like