Data Engineer Resume v1.1

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

Abhiram Y

abhirama1y@gmail.com | (669) 369-9966 | github.com/neveram

Summary
Resourceful and dynamic Data Engineer with over 4+ years of experience. Armed with a diverse skill set in programming
languages including Python, SQL and Java. Proficient in data tools and technologies such as Snowflake, Airflow, Spark, Kafka
and AWS. Demonstrated expertise in data warehousing, ETL processing, Data wrangling and advanced statistical analysis

Skills
Programming Languages: Python, SQL, Java, Bash/Shell script
ETL Tools: Snowflake, Airflow, Tableau, Informatica, Power BI, dbt, Excel
Frameworks: Spark(Databricks), Hadoop, Kafka, Pandas, Numpy, Pytorch, LLM, RAG
Databases: MSSQL, Postgres, Cassandra, MongoDB, Redis, Firebase
Cloud: AWS(S3,Glue,Redshift), Azure(Data Factory,Synapse Analytics), GCP (BigQuery, Looker, pub/sub)
CI/CD: Docker, Kubernetes, Terraform, Jenkins
Certifications: AWS Certified Solutions Architect, Apache Airflow Certified

Experience
Software Engineer- Data, Ulytics May 2022 – Present
Responsibilities:
● Designed and developed Spark/ETL pipelines to consume batch data from cloud-based sources like CRM Dynamics and
applied business logic to transform(Parquet/Text Files), load data into delta tables and enabled self-service reporting and
advanced analytics
● Developed Python scripts utilizing pandas library to automate metadata setup in report log database, saving over 100
hours for data science and analytics team to meet evolving business requirements
● Reduced ETL job failures by 50% through identification and resolution of SQL query mismatches and schema errors in SSIS,
optimizing query execution times, and significantly hastening the pace of data-driven analysis
● Implemented medallion architecture to logically organize the data in the data lake/lakehouse with the goal of
incrementally and progressively improving the structure and quality of data as it flows through each layer of the
architecture
● Architected a diagnostic framework to address extreme changes in data, reducing mean time to detect anomalies by 35%
ensuring data integrity and reliability
● Optimized legacy Spark pipeline, enhancing performance by implementing tuning techniques: avoiding skewed joins,
broadcast joins, repartitioning, and coalescing partitions
● Orchestrated full daily ETL loads across various teams, leveraging Airflow with Celery Executor and enhancing data
integration by dynamically fetching configurations from our internal service registry
● Established secure access to S3 buckets, created roles and policies across AWS accounts reducing service costs by 15%
using aws-sdk(Boto3) and optimal resource allocation strategies for privileged access of documents
● Established CI/CD pipeline, integrating Terraform and AWS, for efficient deployment of ML project artifacts to Databricks,
significantly accelerating iteration cycles
● Enhanced project outcomes through meticulous A/B testing and hypothesis validation using SQL and statistical
frameworks, leading to data-driven recommendations that improved project performance by an average of 20%
● Streamlined Linux servers upgrade effort by encapsulating legacy applications in a Docker container and automating their
configuration using Bash scripting

● Produced insights pivotal for shaping corporate strategies by creating predictive models, interactive data visualizations and
dashboards using PowerBI, which translated complex data findings into actionable insights for stakeholders
Environment:

Project Engineer- Data, Wipro Technologies Jun 2019 – Dec 2021


Responsibilities:
● Excelled in applying and developing UDFs using Python and SQL to dissect and interpret complex datasets, regularly
extracting at least 3+ significant insights each quarter that spurred growth and innovation initiatives
● Developed Python scripts to parse XML, JSON files from multiple source system and load the data into S3 and Snowflake
Data warehouse
● Designed and implemented end-to-end data pipelines in Azure Synapse Analytics, utilizing PySpark/SparkSQL for complex
queries and Spark applications within Synapse Spark pools, culminating in data loading into dedicated SQL Pools
● Reduced monthly support tickets by 30% by redesigning a workflow to reduce background jobs by 25% while collaborating
with stakeholders to deliver within a tight deadline
● Efficiently managed data warehousing by creating Fact and Dimension tables with SCD type 2 in Azure Synapse, leveraging
Polybase for high-efficiency data loads from data lake storage into Synapse tables, outperforming traditional bulk load
methods
● Optimized data alarm engine by migrating from azure VMs to serverless architecture and Azure Eventhubs optimizing
availability of the engine by 30%
● Developed ADF (Azure Data Factory) pipelines, integrating Linked Services, Datasets, and Pipelines for ETL processes across
various sources including Azure SQL, Blob Storage, Azure SQL Data Warehouse, and write-back tools for efficient data
management and backward compatibility
● Automated data pipeline execution in Azure Data Factory by configuring schedules and event-based triggers to meet
specific business requirements, optimizing for data arrival and time-based operations
● Streamlined data transfers and integrity using SQL Server and CDC(Change Data Capture) logic; automated full/incremental
loads and enforced MySQL delete cascade rules via Azure Synapse pipelines
● Created Clustered and non-clustered indexes in the data warehouse tables for faster retrieval of data for complex
analytical and reporting queries
● Utilized HBase to create tables and efficiently manage large volumes of semi-structured data, leveraging both the HBase
shell and API for data ingestion and manipulation
● Designed comprehensive data models (Conceptual, Logical, Physical) using Erwin and E/R Studio tools, and developed an
OLAP system for enhanced data analysis
● Efficiently extracted data from diverse sources including databases, flat files, APIs, and cloud applications using Informatica,
leveraging its connectors and tools for optimal data retrieval
● Developed and maintained interactive dashboards using PowerBI, Tableau and Excel, leading to a noticeable 20% uplift in
data-centric decision-making across various teams
● Reduced data anomalies by 10% and raised data accuracy and integrity standards through the effective application of
robust statistical methods for validating data hypotheses
Environment:

Projects
Distributed Data Stream Pipeline [Python, Kafka, Spark, Airflow, Cassandra] github.com/neveram/Data-Pipeline
● Developed an end-to-end ETL pipeline integrating real-time streaming from REST API endpoint
YOLO-Tennis [pytorch, pandas, numpy, opencv] github.com/neveram/YOLO-Tennis
● Developed a computer vision project to analyze tennis player’s movements, speeds, and shots utilizing object trackers
across frames of a live Tennis match
E-Shopify [Rails, Hotwire, Tailwind CSS, Stripe, PostgreSQL] github.com/neveram/E-Shopify
● Developed a Full-stack web Application for Online store with with dynamic UI and robust payment processing

Education
San Jose State University – MS in Software Engineering 2023
Vardhaman College of Engineering – B.Tech in Computer Science 2019

You might also like