Shweta_Sakhale_8828396084

Shweta Milind Sakhale Phone:
(+91)-8828396084
shwetasakhale13@gmail.com
PROFESSIONAL SUMMARY
❖ Having almost 2.5+ years of experience in designing and developing Big Data applications using
the HDFS, Hive, Sqoop, Spark, DSL and AWS
❖ Understands the Complex Data Processing needs of big data and have experience in developing
codes to address those needs.
❖ Knowledge of Spark SQL optimization techniques, such as cost-based query optimization,

column pruning, and predicate pushdown, and their impact on query performance and resource
utilization.
❖ Strong understanding of Spark SQL integration with other big data technologies, such as Hadoop,
Hive and their impact on data processing workflows and performance.
❖ Experience working with Spark SQL in production environments and implementing performance
monitoring and alerting systems to detect and resolve performance issues proactively.
❖ Proficient in processing serialized data in Spark using various formats, such as Avro, Parquet,
ORC, CSV, text file and their features and limitations.
❖ Experienced in using Spark serialization libraries, such as Java serialization, to optimize data
serialization and deserialization performance.
❖ Skilled in working with binary and textual data formats in Spark, such as CSV, JSON, and XML, and
their serialization and deserialization using Spark Data Frames and RDDs.
❖ Expertise in using Spark serialization and compression techniques, such as block-level

compression, dictionary encoding, and off-heap storage, to reduce data storage and processing
overhead.
❖ Maintained and monitored Spark clusters on AWS EMR, ensuring high availability and fault
tolerance.
❖ Optimized Spark jobs and data processing workflows for scalability, performance, and cost
efficiency using techniques such as partitioning, compression, and caching
❖ Designed and developed Spark applications to implement complex data transformations and
aggregations for batch processing jobs, leveraging Spark SQL and Data Frames.
❖ Experience in data architecture including data ingestion, pipeline design.

❖ Experience in importing and exporting data using Sqoop from HDFS to RDBMS and vice-versa.
❖ Brought in simplification process and Optimization initiatives to bring efficiency into

applications.
❖ Able to collaborate with stakeholders and perform source-to-target data mapping, design and
review.
❖ Data base experience in SQL Server and MYSQL.
❖ Have good problem solving and analytical skills and ready to innovate in order to perform better.
❖ Have strong Interpersonal skills and communication skills.
TECHNICAL SKILLS
Data Eco System : Hadoop, Sqoop, Hive, Spark and AWS

Databases : SQL Server, MySQL
Languages : Scala, Python, SQL, Linux
Operating Systems : Linux and Windows
PROFESSIONAL EXPERIENCE
PROJECT
Client: AT&T January 2021 – Present

Project Role: BIG DATA Developer
❖ Experienced in efficiently using Hive managed and external table with respect to the business
requirement.
❖ Expertise in utilizing Spark RDD transformations and actions for processing large-scale
structured and unstructured datasets, including tasks like filtering, mapping, reducing, grouping,
and aggregating data.
❖ Skilled in employing Spark RDD persistence and caching mechanisms to minimize data
processing overhead and enhance query performance.
❖ Familiarity with Spark RDD lineage and fault tolerance mechanisms and their impact on the
reliability and performance of data processing.
❖ Knowledge of Spark RDD optimization techniques, such as data partitioning, shuffle tuning, and
pipelining, and their effects on query performance and resource utilization.
❖ Ability to troubleshoot common issues with Spark RDD, such as data processing errors,
performance bottlenecks, and limitations in scalability.
❖ Experience working with Spark RDD in production environments and implementing proactive
performance monitoring and alerting systems to identify and resolve performance issues.
❖ Knowledge of best practices in data engineering and data science domains for Spark RDD, such
as data preprocessing, feature engineering, model training, and inference.
❖ Proficient in developing and implementing Spark DataFrame-based data processing workflows

using Scala or Python programming languages.
❖ Experienced in optimizing Spark DataFrame performance by adjusting various configuration

settings, such as memory allocation, caching, and serialization.
❖ Expertise in using Spark DataFrame transformations and actions to process large-scale

structured and semi-structured datasets, including tasks like filtering, mapping, reducing,
grouping, and aggregating data.
❖ Skilled in employing Spark DataFrame persistence and caching mechanisms to minimize data
❖ Familiarity with Spark DataFrame schema and data type operations, such as adding, renaming,
and dropping columns, casting data types, and handling null values.
❖ Familiarity with Spark Data Frame APIs and SQL syntax, and the ability to write complex SQL
queries and DataFrame operations to address business problems.
❖ Experienced in optimizing Spark SQL performance by adjusting various configuration settings,

such as memory allocation, caching, and serialization.
❖ Expertise in using Spark SQL to process large-scale structured and semi-structured datasets,
including tasks like querying, filtering, mapping, reducing, grouping, and aggregating data.
❖ Skilled in employing Spark SQL persistence and caching mechanisms to minimize data
❖ Familiarity with Spark SQL schema and data type operations, such as creating, modifying, and
dropping tables, views, and indexes, and handling null values.
Responsibilities:
Technologies: HDFS, Hive, SQL, DSL, Sqoop, Spark, Scala, Python, AWS
WORK EXPERIENCE
Accenture Pvt Ltd– January 2021– Present
EDUCATION
Institute/College Duration Percentage Obtained

Veermata Jijabai Technological Institute (VJTI),
2021-22 7.56 CGPI
Mumbai
Jawahar Education Society Annasaheb Chudaman
2015-20 6.59 CGPI
Patil College of Engineering, Navi Mumbai
V.M Pilankar Junior College, Alibaug 2014-15 65.38 %
S.R.T High School, Alibaug 2013 73.20%
I hereby declare that all the information in this document is accurate and true to best of
my knowledge.
Yours Sincerely
Place: Navi Mumbai (Shweta Sakhale)

Shweta_Sakhale_8828396084

Uploaded by

Copyright:

Available Formats

You might also like

Shweta_Sakhale_8828396084

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Shweta_Sakhale_8828396084

Uploaded by

Copyright:

Available Formats

Shweta Milind Sakhale Phone:

❖ Knowledge of Spark SQL optimization techniques, such as cost-based query optimization,

❖ Expertise in using Spark serialization and compression techniques, such as block-level

❖ Experience in data architecture including data ingestion, pipeline design.

❖ Brought in simplification process and Optimization initiatives to bring efficiency into

❖ Data base experience in SQL Server and MYSQL.

❖ Have strong Interpersonal skills and communication skills.

Data Eco System : Hadoop, Sqoop, Hive, Spark and AWS

Languages : Scala, Python, SQL, Linux

Operating Systems : Linux and Windows

Client: AT&T January 2021 – Present

❖ Proficient in developing and implementing Spark DataFrame-based data processing workflows

❖ Experienced in optimizing Spark DataFrame performance by adjusting various configuration

❖ Expertise in using Spark DataFrame transformations and actions to process large-scale

❖ Experienced in optimizing Spark SQL performance by adjusting various configuration settings,

Accenture Pvt Ltd– January 2021– Present

Institute/College Duration Percentage Obtained

V.M Pilankar Junior College, Alibaug 2014-15 65.38 %

S.R.T High School, Alibaug 2013 73.20%

Place: Navi Mumbai (Shweta Sakhale)

You might also like