Professional Documents
Culture Documents
Shweta_Sakhale_8828396084
Shweta_Sakhale_8828396084
Shweta_Sakhale_8828396084
(+91)-8828396084
shwetasakhale13@gmail.com
PROFESSIONAL SUMMARY
❖ Having almost 2.5+ years of experience in designing and developing Big Data applications using
the HDFS, Hive, Sqoop, Spark, DSL and AWS
❖ Understands the Complex Data Processing needs of big data and have experience in developing
codes to address those needs.
❖ Strong understanding of Spark SQL integration with other big data technologies, such as Hadoop,
Hive and their impact on data processing workflows and performance.
❖ Experience working with Spark SQL in production environments and implementing performance
monitoring and alerting systems to detect and resolve performance issues proactively.
❖ Proficient in processing serialized data in Spark using various formats, such as Avro, Parquet,
ORC, CSV, text file and their features and limitations.
❖ Experienced in using Spark serialization libraries, such as Java serialization, to optimize data
serialization and deserialization performance.
❖ Skilled in working with binary and textual data formats in Spark, such as CSV, JSON, and XML, and
their serialization and deserialization using Spark Data Frames and RDDs.
❖ Maintained and monitored Spark clusters on AWS EMR, ensuring high availability and fault
tolerance.
❖ Optimized Spark jobs and data processing workflows for scalability, performance, and cost
efficiency using techniques such as partitioning, compression, and caching
❖ Designed and developed Spark applications to implement complex data transformations and
aggregations for batch processing jobs, leveraging Spark SQL and Data Frames.
❖ Able to collaborate with stakeholders and perform source-to-target data mapping, design and
review.
❖ Have good problem solving and analytical skills and ready to innovate in order to perform better.
TECHNICAL SKILLS
PROFESSIONAL EXPERIENCE
PROJECT
❖ Experienced in efficiently using Hive managed and external table with respect to the business
requirement.
❖ Expertise in utilizing Spark RDD transformations and actions for processing large-scale
structured and unstructured datasets, including tasks like filtering, mapping, reducing, grouping,
and aggregating data.
❖ Skilled in employing Spark RDD persistence and caching mechanisms to minimize data
processing overhead and enhance query performance.
❖ Familiarity with Spark RDD lineage and fault tolerance mechanisms and their impact on the
reliability and performance of data processing.
❖ Knowledge of Spark RDD optimization techniques, such as data partitioning, shuffle tuning, and
pipelining, and their effects on query performance and resource utilization.
❖ Ability to troubleshoot common issues with Spark RDD, such as data processing errors,
performance bottlenecks, and limitations in scalability.
❖ Experience working with Spark RDD in production environments and implementing proactive
performance monitoring and alerting systems to identify and resolve performance issues.
❖ Knowledge of best practices in data engineering and data science domains for Spark RDD, such
as data preprocessing, feature engineering, model training, and inference.
❖ Skilled in employing Spark DataFrame persistence and caching mechanisms to minimize data
processing overhead and enhance query performance.
❖ Familiarity with Spark DataFrame schema and data type operations, such as adding, renaming,
and dropping columns, casting data types, and handling null values.
❖ Familiarity with Spark Data Frame APIs and SQL syntax, and the ability to write complex SQL
queries and DataFrame operations to address business problems.
❖ Expertise in using Spark SQL to process large-scale structured and semi-structured datasets,
including tasks like querying, filtering, mapping, reducing, grouping, and aggregating data.
❖ Skilled in employing Spark SQL persistence and caching mechanisms to minimize data
processing overhead and enhance query performance.
❖ Familiarity with Spark SQL schema and data type operations, such as creating, modifying, and
dropping tables, views, and indexes, and handling null values.
Responsibilities:
Technologies: HDFS, Hive, SQL, DSL, Sqoop, Spark, Scala, Python, AWS
WORK EXPERIENCE
EDUCATION
I hereby declare that all the information in this document is accurate and true to best of
my knowledge.
Yours Sincerely