John pual2

338585Joshna
Professional Summary
● Overall, 10 years of experience in the IT industry and expertise in Big Data/ Hadoop Development framework and Analysis,
Design, Development, Testing, Documentation, Deployment, and Integration using SQL, Python, Spark, and Big Data
technologies.
● Expertise in Hadoop architecture and ecosystems such as HDFS, MapReduce, Pig, Hive, Sqoop Flume, and Oozie.
● Complete Understanding of Hadoop daemons such as Job Tracker, Task Tracker, Name Node, Data Node, and YARN
architecture.
● Experience in installation, configuration, management, supporting and monitoring Hadoop clusters using various distributions
such as Apache Hadoop, Cloudera Hortonworks, and various cloud services like AWS, and GCP.
● Experience in Installation and Configuring Hadoop Stack elements MapReduce, HDFS, Hive, Pig, Sqoop, Flume, Oozie, and
Zookeeper.
● Expertise in writing custom Kafka consumer code and modifying existing producer code in Python to push data to Spark-
streaming jobs.
● Ample knowledge of Apache Kafka, and Apache Storm to build data platforms, pipelines, and storage systems; and search
technologies such as Elastic search.
● Experience in using Elastic search to store, search, and analyze big volumes of data quickly and in near real-time.
● Proficient with complex workflow orchestration tools namely Oozie, Airflow, Data pipelines and Azure Data Factory,
CloudFormation & Terraform.
● Implemented Data warehouse solution consisting of ETLS, On-premises to Cloud Migration, and good expertise in building and
deploying batch and streaming data pipelines on cloud environments.
● Worked on Airflow 1.8(Python2) and Airflow 1.9(Python3) for orchestration and familiar with building custom Airflow operators
and orchestration of workflows with dependencies involving multi-clouds.
● Leveraged Spark as an ETL tool for building data pipelines on various cloud platforms like AWS EMRS, Azure HD Insights, and
MapR CLDB architectures.
● Spark for ETL followers, Databricks Enthusiast, Cloud Adoption & Data Engineering enthusiast in the Open-source community.
● Proven expertise in deploying major software solutions for various high-end clients meeting business requirements such as Big
Data Processing, Ingestion, Analytics, and Cloud Migration from On-prem to Cloud.
● Proficient with Azure Data Lake Services (ADLS), Databricks & Python Notebooks formats, Databricks Delta lakes & Amazon Web
Services (AWS).
● Experience in developing enterprise-level solutions using batch processing (using Apache Pig) and streaming framework (using
Spark Streaming, Apache Kafka & Apache Flink).
● Orchestration experience using Azure Data Factory, Airflow 1.8, and Airflow 1.10 on multiple cloud platforms and able to
understand the process of leveraging the Airflow Operators.
● Knowledge in automated deployments leveraging Azure Resource Manager Templates, DevOps, and Git repository for
Automation and usage of Continuous Integration (CI/CD).
● Experienced in data processing and analysis using Spark, HiveQL, and SQL.
● Extensive experience in Writing 338585User-Defined Functions (UDFs) in Hive and Spark.
● Worked on Apache Sqoop to perform importing and exporting data from HDFS to RDBMS/NoSQL DBs and vice-versa.
● Working experience on NoSQL databases like HBase, Azure, MongoDB, and Cassandra with functionality and implementation.
● Extensive experience working with AWS Cloud services and AWS SDKs to work with services like AWS API Gateway, Lambda, S3,
IAM, and EC2.
● Extensive experience working with GCP Cloud services like GCP cloud dataflow, GCP Big Query, GCP Cloud Storage, GCP
PUB/SUB
● Excellent understanding of Zookeeper for monitoring and managing Hadoop jobs.
● Experience with NumPy, Matplotlib, Pandas, Seaborn, and Plotly Python libraries. Worked on large datasets by using PySpark,
NumPy, and pandas.
● Utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means,
& KNN for data analysis.
● Experience in HANA SQL Script Stored Procedures, Table functions, and dynamic privileges by SQL query in information models.
● Experience with popular JavaScript libraries and frameworks, such as React, Angular, or Vue.js.
● Proficiency in Big Data Practices and Technologies like HDFS, MapReduce, Hive, Pig, HBase, Sqoop, Oozie, Flume, Spark, Kafka.
● Experienced database professional with a strong background in Sybase and DB2 database management.
● Hands-on experience with ETL, Hadoop, and Data Governance tools such as Tableau, Informatica Enterprise Data Catalog
● Solid Experience and understanding of Implementing large-scale Data warehousing Programs and E2E Data Integration Solutions
on Snowflake Cloud, GCP, Redshift, Informatics Intelligent Cloud Services (IICS - CDI) & Informatica Power Center integrated with
multiple Relational databases (MySQL, Teradata, Oracle, Sybase, SQL server, DB2)
● Experience in designing & developing applications using Big Data technologies HDFS, Map Reduce, Sqoop, Hive, PySpark & Spark
SQL, HBase, Python, Snowflake, S3 storage, and Airflow.
● Experience in doing performance tuning for map-reduce jobs & hive complex queries.
● Experience in efficiently doing ETL using Spark - in-memory processing, Spark SQL, and Spark streaming using Kafka distributed
messaging system.
● Extensive experience in the deve338585lopment of Bash scripting, TSQL, and PL/SQL scripts.
● Understanding of structured data sets, data pipelines, ETL tools, data reduction, transformation, and aggregation techniques,
Knowledge of tools such as DBT, DataStage
● Led successful data migration projects, including extraction, transformation, and loading (ETL) processes.
● Have good knowledge of Job Orchestration tools like Oozie, Zookeeper & Airflow.
● Written PySpark job in AWS Glue to merge data from multiple tables and in Utilizing Crawler to populate AWS Glue Data Catalog
with metadata table definitions.
● Generated a script in AWS Glue to transfer the data and utilized AWS Glue to run ETL jobs and aggregation on PySpark code.
● Standardization to improve Master Data Management (MDM) and other common data management issues.
● Excellent performance in building and publishing customized interactive reports and dashboards with customized parameters
including producing tables, graphs, and listings using various procedures and tools such as Tableau and user-filters using
Tableau.
● Practical understanding of Data modeling (Dimensional & Relational) concepts like Star-Schema Modeling, Snowflake Schema
Modeling, Fact and Dimension tables.
● Extensive experience across both relational databases and non-relational databases Oracle, PL/SQL, SQL Server, MySQL, and
DB2.
● Holds a strong ability to handle multiple priorities and workloads and can understand and adapt to new technologies and
environments faster.
Technical Skills:
Hadoop/Spark Ecosystem Hadoop, MapReduce, Pig, Hive/impala, YARN, Kafka, Flume, Oozie, Zookeeper,
Spark, Airflow, MongoDB, Cassandra, HBase, and Storm.
Hadoop Distribution Cloudera distribution and Horton works
Programming Languages Scala, Hibernate, JDBC, JSON, HTML, CSS, SQL, R, Shell Scripting
Script Languages: JavaScript, jQuery, Python.
Databases Oracle, SQL Server, MySQL, Cassandra, Teradata, PostgreSQL, MS Access,

Snowflake, NoSQL, HBase, MongoDB
Cloud Platforms AWS, Azure, GCP
Distributed Messaging System Apache Kafka
Data Visualization Tools Tableau, Power BI, SAS, Excel, ETL

Batch Processing Hive, MapReduce, Pig, Spark
Operating System Linux (Ubuntu, Red Hat), Microsoft Windows
Reporting Tools/ETL Tools Informatica Power Centre, Tableau, Pentaho, SSIS, SSRS, Power BI
Work Experience:
Client: Edward Lifesciences, Irvine, CA May 2019 - Present

Role: Sr. Data Engineer
Responsibilities:
● Demonstrated proficiency in Agile methodologies, working within cross-functional Agile teams to deliver data engineering
solutions on schedule and within scope.
● Actively participated in Agile ceremonies such as daily stand-ups, sprint planning, sprint reviews, and retrospectives to ensure
effective collaboration and communication within the team.
● Designed and implemented end-to-end data solutions on the Azure cloud platform, including Azure Databricks, Azure Synapse
Pipeline, and Azure Blob Storage.
● Developed and managed Azure Data Lake and Azure Blob Storage accounts to store, manage, and secure data assets.
● Created and maintained ETL pipelines using Azure Data Factory to orchestrate data workflows.
● Collaborated with cross-functional teams to understand business requirements and translated them into scalable data solutions
using Azure services.
● Worked with Azure Postgres SQL and Azure SQL databases to store and retrieve data for various applications and analytics.
● Designed and developed complex data models and database schemas in Azure SQL Database, optimizing data storage, retrieval,
and organization.
● Engineered end-to-end data pipelines using SQL for data extraction, transformation, and loading (ETL) processes, ensuring high-
quality data for analytics and reporting.
● Conducted data cleansing, transformation, and enrichment using SQL scripts to ensure data accuracy and consistency.
● Created and optimized SQL queries, stored procedures, and data processing logic for data analysis and reporting within Azure
SQL Database.
● Managed and optimized data storage, indexing, and partitioning strategies within Azure databases and data lakes for efficient
data access.
● Utilized Spark Streaming and Azure Databricks for real-time data processing and streaming analytics.
● Implemented data versioning, change tracking, and data lineage for enhanced data governance and auditing in Azure
environments.
● Developed end-to-end data pipelines in Azure Databricks, encompassing the bronze, silver, and gold stages for comprehensive
data processing.
● Implemented the Bronze stage in data pipelines, focusing on raw data ingestion, storage, and initial data quality checks.
● Enhanced data quality and usability by transitioning data through the silver stage, performing data transformations,
normalization, and schema changes.
● Orchestrated data cleaning and transformation processes within Azure Databricks, ensuring the silver data was structured and
ready for analysis.
● Leveraged Databricks for advanced data transformations, including aggregations, joins, and feature engineering, to prepare data
for analytical purposes in the gold stage.
● Stored and managed gold data in Azure data warehousing solutions, optimizing data structures for high-performance querying
and reporting.
● Implemented Spark Streaming for real-time data processing and analytics, enabling immediate insights from streaming data
sources.
● Developed and maintained Spark Streaming applications to process, transform, and analyze high-velocity data streams.
● Ensured data security and access control on Azure through identity and access management, encryption, and compliance
measures.
● Developed data archiving and retention strategies in Snowflake to store historical data while optimizing storage costs.
● Created and maintained historical data models and schemas in Snowflake, ensuring data organization and indexing for efficient
querying.
● Conducted performance tuning and optimization of data processing workflows to reduce processing times and costs.
● Managed and optimized Snowflake data warehousing solutions for data storage and retrieval.
● Developed and maintained PySpark and Pandas-based data processing scripts and notebooks for data transformation and
analysis.
● Utilized PySpark to build scalable data processing applications, taking advantage of parallel processing and in-memory
computing.
● Collaborated with data scientists to enable machine learning model deployment and data access in Azure environments.
● Assisted in data migration projects, including the movement of on-premises data to Azure cloud environments.
● Worked on continuous integration and continuous deployment (CI/CD) pipelines for automated testing and deployment of data
solutions.
● Collaborated with Azure DevOps teams to ensure high availability, scalability, and resource optimization for data systems.
● Demonstrated adaptability and a commitment to staying updated on emerging Azure and data engineering technologies and
trends.
● Successfully completed data migration projects to Azure, ensuring data consistency and integrity during the transition.
● Ensured compliance with data privacy regulations and company policies, including GDPR and HIPAA, by implementing data
access controls and encryption.
● Collaborated with business analysts and stakeholders to gather data requirements and define data models that align with
business needs.
Environment: Python, SQL, Azure Databricks, Azure synapse pipeline, Azure blob storage, Azure data lake, Terraforms, Azure
Postgres SQL, Azure SQL, Spark streaming, GitHub, PyCharm, Snowflake, PySpark, Pandas
Client: Cisco/HCL, NY Dec 2016 – Mar 2018

Role: Sr Data Engineer
Responsibilities:
● Responsible for requirement gathering, understanding the business value, and providing analytical data solutions.
● Designed, implemented, and managed data pipelines on AWS using services like AWS Glue, AWS Data Pipeline, or custom ETL
scripts, ensuring smooth data flow from various sources to data destinations.
● Designed and implemented data ingestion pipelines using AWS Glue, ensuring efficient and scalable data processing.
● Developed ETL (Extract, Transform, Load) workflows in AWS Glue to cleanse, transform, and load data from various sources into
data lakes and data warehouses.
● Configured, Scheduled, and Triggered ETL jobs in ETL Orchestrator using JSON to load the source data into Data Lake.
● Created a Snowflake warehouse strategy and set it up to use PUT scripts to migrate a terabyte of data from S3 into Snowflake.
● By creating a customized read/write Snowflake utility function in Python, data was transferred from an AWS S3 bucket to
Snowflake.
● Created S3 buckets and managed S3 bucket policies, as well as using S3 buckets for storage and backup on AWS.
● Developed ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS
Snowflake.
● Worked on GIT hub to clone the repository and commit the changes in different versions of the code and released to Bit-Bucket
by creating the Pull Requests for the merge process.
● Extensively worked towards performance tuning/optimization of queries, contributing to an improvement in the deployed code.
● Created a data pipeline involving various AWS services including S3, Kinesis firehose, kinesis data stream, SNS, SQS, Athena,
Snowflake, etc.
● Worked on end-to-end deployment of the project that involved Data Analysis, Data Pipelining, Data Modelling, Data Reporting,
and Data documentation as per the business needs.
● Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation, and aggregation from multiple
file formats.
● Utilized AWS Glue Dynamic Frames for schema inference and dynamic data transformations, improving data processing
flexibility.
● Managed and optimized AWS Glue Data Catalog to maintain metadata and improve query performance in Athena.
● Created and managed AWS Glue Crawlers to automatically discover and catalog metadata from data sources in Amazon S3.
● Implemented data partitioning strategies in Amazon S3 to enhance query performance and reduce data processing costs.
● Leveraged AWS Glue Jobs to orchestrate and schedule ETL workflows, ensuring timely data updates.
● Developed custom PySpark scripts within AWS Glue Jobs for complex data transformations and data enrichment.
● Worked on data lineage tracking and data quality monitoring using AWS Glue and AWS CloudWatch.
● Collaborated with Data Scientists and Analysts to provide clean and structured data for their analytical needs.
● Optimized and fine-tuned AWS Glue Jobs to reduce data processing time and costs, enhancing overall efficiency.
● Implemented data encryption and security measures for sensitive data stored in Amazon S3.
● Created and maintained AWS Glue Development Endpoints for interactive development and debugging.
● Automated data pipeline orchestration using AWS Step Functions and Lambda functions.
● Developed AWS Lambda functions for event-driven data processing and data pipeline triggers.
● Collaborated with DevOps teams to integrate AWS Glue into CI/CD pipelines for automated deployment and monitoring.
● Designed and maintained data lake architectures in Amazon S3, incorporating data partitioning and lifecycle policies.
● Authoring Python (PySpark) Scripts for custom UDFs for Row/ Column manipulations, merges, aggregations, stacking, data
labeling, and all Cleaning and conforming tasks.
● Developed and optimized Spark jobs using PySpark to extract data from AWS S3, transform it using SQL and Spark functions, and
write it to Redshift.
● Created and managed AWS Athena queries to enable self-service data exploration for business users.
● Developed complex SQL queries in Athena for ad-hoc analysis and reporting. Integrated Tableau with Athena for real-time data
visualization and reporting.
● Worked on data performance tuning in Athena, optimizing query execution plans.
● Created Tableau dashboards and reports to provide actionable insights to business stakeholders.
● Collaborated with Data Architects to design and implement data models that facilitate efficient data retrieval in Athena.
● Implemented data retention and archival strategies in S3 for compliance and cost management.
● Provided on-call support and troubleshooting for data pipeline issues, ensuring data availability and reliability.
● Documented ETL processes, data schemas, and pipeline configurations for knowledge sharing and team collaboration.
● Conducted performance testing and optimization of AWS Glue and Athena to meet SLAs.
● Stayed updated on AWS services and industry best practices to drive continuous improvement in data engineering processes.
● Collaborated with cross-functional teams to identify business requirements and translate them into data engineering solutions.
● Actively participated in AWS community forums and meetups, sharing knowledge and insights with the data engineering
community.
Environment: Agile, AWS glue, S3, Amazon Athena, lambda functions, PySpark, PL/SQL, XML, JSON, Avro, CSV, Parquet, Tableau,
MySQL
Client: Bank of America/Infosys, NY Aug 2015 – Nov 2016

Role: Sr Data Engineer
Responsibilities:
● As a Big Data Developer implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing
Big Data technologies such as Hadoop, MapReduce Frameworks, MongoDB, Hive, Oozie, Flume, Sqoop, Talend, etc.
● Migrate an in-house database to AWS Cloud and designed, built, and deployed a multitude of applications utilizing the AWS
stack (Including EC2, RDS) by focusing on high availability and auto-scaling.
● Worked on analyzing Hadoop cluster using different big data analytic tools including Flume, Pig, Hive, HBase, Oozie, Zookeeper,
Sqoop, Spark, and Kafka.
● Developed Spark code using Python and Spark-SQL/Streaming for faster testing and processing of data and
● Data Extraction, aggregations, and consolidation of Adobe data within AWS Glue using PySpark.
● Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions and Real time streaming the
data using Spark with Kafka for faster processing.
● Installed 5 node Hortonworks cluster in AWS and Google cloud on ec2 instances and setting up Hortonworks Data Platform
Cluster on Cloud and configuring it to be used as a Hadoop platform for running jobs.
● Involved in analyzing data coming from various sources and creating Meta-files and control files to ingest the data into the Data
Lake.
● Developed data pipeline programs with Spark Python APIs, data aggregations with Hive, and formatting data (JSON) for
visualization, and generating. E.g., High charts: Outlier, data distribution, Correlation/comparison and
● Extensively worked on Python and built the custom ingest framework and worked on Rest API using python.
Analyzed the SQL scripts and designed the Solution to Implement Using PySpark and created custom new columns depending on
the use case while ingesting the data into Hadoop Lake using Pyspark.
● Explored with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context,
Spark -SQL, Data Frame, Pair RDD's, and Spark YARN.
● Handled importing of data from various data sources, performed transformations using Hive, and MapReduce, loaded data into
HDFS, and Extracted the data from SQL into HDFS using Sqoop.
● Installed Hadoop, Map Reduce, and HDFS and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-
processing.
● Imported the data from different sources like HDFS/HBase into SparkRDD and configured deployed and maintained multi-node
Dev and Test Kafka Clusters.
● Virtualized the servers using the Docker for the test environments and dev-environments needs. And also, configuration
automation using Docker containers.
● Created Elastic Map Reduce (EMR) clusters and Configured the Data pipeline with EMR clusters for scheduling the task runner
and provisioning of Ec2 Instances on both Windows and Linux.
● Involved in converting MapReduce programs into Spark transformations using Spark RDD's on Scala and developed Spark scripts
by using Scala Shell commands as per the requirement.
● Performed transformations, cleaning, and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
● Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark Context,
Spark-SQL, Data Frame, Pair RDD's, and Spark YARN.
● Design & implement ETL process using Talend to load data from Worked extensively with Sqoop for importing and exporting the
data from HDFS to Relational Database systems/mainframe and vice-versa. Loading data into HDFS.
● Created and altered HBase tables on top of data residing in Data Lake and Created external Hive tables on the Blobs to
showcase the data to the Hive Meta Store.
● Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
Environment: Hadoop, Data Lake, JavaScript, Python, HDFS, Spark, AWS Redshift, AWS Glue, Lambda, MapReduce, Pig, Hive, Sqoop,
Kafka, HBase, Oozie, Flume, Scala, Python, Java, SQL Scripting and Talend, Pyspark, Linux Shell Scripting, Kinesis, Docker, Zookeeper,
HBase, EC2, EMR, S3, Oracle, MySQL.
Client: Darden Restaurants/Infosys, India Aug 2014 – July 2015

Role: Data Engineer
Responsibilities:
● Involved in understanding the requirements of the End Users/Business Analysts and Developed Strategies for ETL processes.
● Proficient in designing and implementing data integration solutions using Informatica PowerCenter, IICS, Edge, EDC, and IDQ.
● Extracted data from various sources and performed transformations to clean, structure, and prepare data for visualization using
Power Query in Power BI.
● Developed complex DAX calculations and measures to derive meaningful insights and key performance indicators (KPIs) in
Power BI.
● Developed and optimized data pipelines using Google Dataflow to efficiently process, transform, and load healthcare data from
various sources into Google Cloud Storage (GCS) and BigQuery. Ensured data quality, scalability, and performance of the
pipelines to meet the client's needs.
● Facilitated the integration and migration of healthcare data from on-premises systems or other cloud platforms to Google Cloud
Platform (GCP), ensuring seamless and secure transfer while adhering to data privacy regulations such as HIPAA. Utilized GCP
Dataflow for real-time or batch data ingestion, transformation, and synchronization.
● Designed and implemented data warehousing solutions on BigQuery to provide healthcare clients with a centralized repository
for storing and analyzing large volumes of structured and unstructured data. Optimized data models, partitioning strategies, and
indexing to support complex queries and enable advanced analytics for insights generation.
● Utilized GCP Dataprep for data preparation and cleansing tasks, ensuring that healthcare data was cleansed, standardized, and
enriched to maintain accuracy and consistency. Implemented data quality checks and validation processes to identify and
resolve anomalies or discrepancies in the data before loading it into the target systems.
● Monitored the performance and health of data pipelines, storage, and analytical processes using GCP monitoring and logging
tools. Proactively identified bottlenecks, latency issues, or resource constraints and implemented optimizations or adjustments
to improve overall system efficiency, reliability, and cost-effectiveness.
● Developed mappings/Reusable Objects/Transformation by using a mapping designer, and transformation developer in
Informatica Power Center.
● Designed and developed ETL Mappings to extract data from flat files, and Oracle to load the data into the target database.
● Extensive experience in designing, developing, and implementing ETL processes using Informatica PowerCenter.
● Proven ability to analyze complex data requirements and design efficient and scalable ETL workflows to meet business
objectives.
● Expertise in data profiling, data cleansing, and data quality management using Informatica Data Quality (IDQ) to ensure data
accuracy and consistency.
● Skilled in utilizing Tivoli Workload Scheduler (TWS) for job scheduling and automation in data integration processes.
● Proficient in creating mappings, workflows, and sessions in Informatica PowerCenter for data extraction, transformation, and
loading.
● Developed Informatica Mappings for the complex business requirements provided using different transformations like
Normalizer, SQL Transformation, Expression, Aggregator, Joiner, Lookup, Sorter, Filter, Router, and so on.
● Used ETL to load data using PowerCenter/Power Connect from source systems like Flat Files and Excel Files into staging tables
and load the data into the target database.
● Developed complex mappings using multiple sources and targets in different databases, and flat files.
● Developed SQL queries to develop the Interfaces to extract the data in regular intervals to meet the business requirements.
● Extensively used Informatica Client Tools Source Analyzer, Warehouse Designer, Transformation Developer, Mapping Designer,
Mapplet Designer, Informatica Repository.
● Worked on creating Informatica mappings with different transformations like lookup, SQL, Normalizer, Aggregator, SQ, Joiner,
Expression, Router, etc.
● Designed and developed Informatica ETL Interfaces to load data incrementally from Oracle databases and Flat files into staging
schema.
● Used various transformations like Unconnected/Connected Lookup, Aggregator, Expression Joiner, Sequence Generator, Router
etc.
● Responsible for the development of Informatica mappings and tuning for better performance.
● Created transformations like Expression, Lookup, Joiner, Rank, Update Strategy, and Source Qualifier transformation using the
Informatica designer.
Environment: Informatica Power Center 10.4/10.2, Power BI, Oracle, GCP,Flat Files, SQL, and Windows.
Client: Royal Bank of Scotland/Infosys, India Feb 2012 – July 2014

Role: Data Engineer
Responsibilities:
● Designed DTS/SSIS packages to transfer data between servers, load data into the database, archive data files from different
DBMSs using SQL Enterprise Manager/SSMS on SQL Server 2008 environment and deploy the data.
● Designed and implemented effective data models in Power BI, ensuring optimal relationships between tables for accurate
reporting.
● Worked with business users, business analysts, IT leads and developers in analyzing business requirements and translating
requirements into functional and technical design specifications.
● Created SSIS packages to capture the daily maintenance plan scheduled jobs status success failure with a status report daily.
Also created an SSIS package to list down the server configuration, database sizing, non-DB owned objects, and public role
privileges necessary for the Monthly report as per the audit requirement.
● Worked with SQL Server and T-SQL in constructing DDL/DML triggers, tables, user-defined functions, views, indexes, Stored
Procedures, user profiles, relational database models, cursors, Common Table Expression CTEs, data dictionaries, and data
integrity.
● Worked closely with the team in designing, developing, and implementing the logical and physical model for the Data mart.
● Identify and resolve database performance issues, database capacity issues, and other distributed data issues.
● Designed ETL Extract, Transform, and Load strategy to transfer data from source to stage and stage to target tables in the data
warehouse tables and OLAP database from heterogeneous databases using SSIS and DTS Data Transformation Service.
● Performed the ongoing delivery, migrating client mini-data warehouses or functional data marts from different environments to
MS SQL servers.
● Involved in the creation of dimension and fact tables based on the business requirements.
● Developed SSIS packages to export data from Excel Spreadsheets to SQL Server, automated all the SSIS packages, and monitored
errors using SQL Job daily.
● Prepared reports using SSRS SQL Server Reporting Service to highlight discrepancies between customer expectations and
customer service efforts which involved scheduling the subscription reports via the subscription report wizard.
● Created reports and dashboards, scorecards, KPIs, using SSRS, SharePoint 2013, Excel Services, and data-driven email reports.
● Created and managed SharePoint subsites, pages, lists, reports, and dashboards.
● Created dynamic parameterized MDX queries for SSRS reporting from cube data.
● Performed Data Analysis, Dimensional Modeling, Database Development, and ETL to support a data mart and cube.
● Involved in generating and deploying various reports using global variables, expressions, and functions using SSRS.
● Migrated data from legacy system text-based files, Excel spreadsheets, and Access to SQL Server databases using DTS, SQL
Server Integration Services SSIS to overcome the transformation constraints.
● Automated the SSIS jobs in the SQL scheduler as SQL server agent job for daily, weekly, and monthly loads.
● Designed reports using SQL Server Reporting Services SSRS based on OLAP cubes which make use of multiple value selection in
parameters pick list, cascading prompts, matrix dynamics reports and other features of reporting service.
Environment: T-SQL, SQL Server 2008/ 2008R2, Power BI, SSRS, SSIS, SSAS, MS Visio, BIDS, Agile.

John pual2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

John pual2

Uploaded by

Copyright:

Available Formats

338585Joshna

● Extensive experience in Writing 338585User-Defined Functions (UDFs) in Hive and Spark.

Hadoop Distribution Cloudera distribution and Horton works

Script Languages: JavaScript, jQuery, Python.

Databases Oracle, SQL Server, MySQL, Cassandra, Teradata, PostgreSQL, MS Access,

Cloud Platforms AWS, Azure, GCP

Distributed Messaging System Apache Kafka

Data Visualization Tools Tableau, Power BI, SAS, Excel, ETL

Operating System Linux (Ubuntu, Red Hat), Microsoft Windows

Client: Edward Lifesciences, Irvine, CA May 2019 - Present

Client: Cisco/HCL, NY Dec 2016 – Mar 2018

Client: Bank of America/Infosys, NY Aug 2015 – Nov 2016

Client: Darden Restaurants/Infosys, India Aug 2014 – July 2015

Client: Royal Bank of Scotland/Infosys, India Feb 2012 – July 2014

You might also like