Professional Documents
Culture Documents
Bikash - Jha-CV - Docx (1) 2
Bikash - Jha-CV - Docx (1) 2
I’m a driven data engineer with 4+ years of experience playing with big datasets. I have Serverless, Data Pipelines
experience and knowledge in a diverse set of disciplines, technologies and tools like Data Technologies: Python 2/3, Spark, Spark
Science, Cloud and data pipelines, and data architecture. Streaming, Elastic, Kubernetes, Docker,
Airflow
Looking for a career opportunity to apply my skills/experience on challenging projects. Eager
Cloud: AWS, GCP
to build robust databases that lay the groundwork for revealing game-changing insights for
Databases: MySQL, MongoDB,
the business. involving extensive use of current IT techniques to contribute productively
towards the growth of the company thus growing professionally. DynamoDB, PostGre, HBase
Big Data: Kafka, MapR/Cloudera, Hive,
Zookeeper, Oozie, HBase
Experience
Planet Labs, Berlin, Germany
Senior Data Engineer
OCT 2021 - CURRENT
○ Geospatial Infrastructure & Data Platform Management:
○ Orchestrated the design and management of a comprehensive infrastructure to ingest and process geospatial data from a
constellation of 250+ satellites.
○ Created specialized data structures to accommodate both vector (e.g., feature collection ,multi- polygons, points) and raster
(e.g., satellite imagery, elevation models) geospatial datasets.
○ Real-Time Processing & Advanced Geospatial Operations:
○ Leveraged PubSub and Apache Spark for real-time geospatial data streaming and on-the-fly analytics.
○ Conducted spatial join operations, proximity analysis, and business analytics and contextualized incoming satellite data.
○ Data Storage, Schema Design & Spatial Indexing:
○ Designed schemas in BigQuery and Bigtable, incorporating spatial indices for optimized query performance on petabytes of
geospatial data.
○ Implemented a Data Lake (GCS) to store geoparquet.
○ Geospatial Query Development & Spatial Analysis:
○ Formulated complex SQL and spatial SQL queries in BigQuery and Bigtable to extract, transform, and analyze geospatial
data.
○ Employed advanced geospatial techniques such as geostatistical analyses,, and business modeling to provide deep insights
for CSMs.
○ Pipeline Orchestration, Backfill & Spatial ETL Operations:
○ Deployed Apache Airflow on a GKE cluster, integrating geospatial ETL (Extract, Transform, Load) processes with gcsfuse
and PubSub.
○ Managed backfill operations, ensuring the spatial integrity and accuracy of geospatial datasets over time.
○ Legacy Support, Enhancements & Monitoring:
○ Maintained and enhanced legacy batch pipelines on Apache Beam
○ Grafana setup and created metric in grafana (LogQL, StackDriver, Prometheus)
○ Slack integration with grafana and airflow.
Tech Stack: GCP, BigQuery, Bigtable, Spark,PubSub,Kubernetes, Python, Golang, Airflow ,Qgis, Mapbox
Homelike Internet GmbH, Cologne, Germany
Senior Data Engineer
OCT 2021 - CURRENT
○ Managing 6 different ETL processes to transfer data across MongoDB, BigQuery, SalesForce and several Marketing Channels
○ Designing serverless framework for real-time consumption of user tracking data using kinesis
○ Implementation of Dead-Letter-Queue in Kinesis data stream (Kinesis to lambda to SQS)
○ Conceptualisation of a new Microservices architecture using:
○ Elastic Kubernetes Cluster on AWS
○ E/L/K Stack for monitoring
○ Airflow on Kubernetes with git sync
○ Kafka on K8s
○ Spark on K8s to run pyspark jobs
Tech Stack: GCP, BigQuery, AWS, Lambda, Kinesis, SQS, CloudWatch, Kubernetes, Python
Tech Stack: Python , Amazon S3, Azure Blob, Cosmos DB, MongoDB, Kafka, Kubernetes, PySpark
Education
University Of Kalyani, West Bengal
Bachelor of Technology in 2016 Information Technology
2016
Awards: Awarded as Employee of the Month 2017, Awarded as Employee of the Quarter Q2 ‘18, On-Site Opportunities - Mexico for AT&T and
Manila for PLDT.
Reference
Available upon request