Professional Documents
Culture Documents
Bikash Jha CV 2024
Bikash Jha CV 2024
Bikash Jha CV 2024
I’m a driven data engineer with 7+ years of experience playing with big datasets. I have Serverless, Data Pipelines
experience and knowledge in a diverse set of disciplines, technologies and tools like Data Technologies: Python 2/3,Golang,
Science, Cloud and data pipelines, and data architecture. Spark, Spark Streaming, Elastic,
Kubernetes, Docker, Airflow
Looking for a career opportunity to apply my skills/experience on challenging projects. Eager
Cloud: AWS, GCP
to build robust databases that lay the groundwork for revealing game-changing insights for
Databases: MySQL, MongoDB,
the business. involving extensive use of current IT techniques to contribute productively
towards the growth of the company thus growing professionally. DynamoDB, PostGre, HBase
Big Data: Kafka, MapR/Cloudera, Hive,
Zookeeper, Oozie, HBase
Monitoring : Grafana , Loki,
Prometheus, StackDriver Logging
Experience
Planet Labs, Berlin, Germany
Senior Data Engineer
OCT 2021 - CURRENT
○ Geospatial Infrastructure & Data Platform Management:
○ Orchestrated the design and management of a comprehensive infrastructure to ingest and process geospatial data from a
constellation of 250+ satellites.
○ Created specialized data structures to accommodate both vector (e.g., feature collection ,multi- polygons, points) and raster
(e.g., satellite imagery, elevation models) geospatial datasets.
○ Real-Time Processing & Advanced Geospatial Operations:
○ Leveraged PubSub/PubSub-Lite and Apache Spark for real-time geospatial data streaming and on-the-fly analytics.
○ Conducted spatial join operations, proximity analysis, and business analytics and contextualized incoming satellite data.
○ Data Storage, Schema Design & Spatial Indexing:
○ Designed schemas in BigQuery and Bigtable, incorporating spatial indices for optimized query performance on petabytes of
geospatial data.
○ Implemented a Data Lake (GCS) to store geoparquet.
○ Geospatial Query Development & Spatial Analysis:
○ Formulated complex SQL and spatial SQL queries in BigQuery and Bigtable to extract, transform, and analyze geospatial
data.
○ Employed advanced geospatial techniques such as geostatistical analyses,, and business modeling to provide deep insights
for CSMs.
○ Pipeline Orchestration, Backfill & Spatial ETL Operations:
○ Deployed Apache Airflow on a GKE cluster, integrating geospatial ETL Dags with gcsfuse and PubSub.
○ Managed backfill operations with Airflow, ensuring the spatial integrity and accuracy of geospatial datasets over time.
○ Deployment of Spark on GKE-K8s to pod to run spark jobs on Kubernetes with Autoscaling Enabled.
● Reporting & Backend Integration:
○ Golang-Based Reporting System: Implemented a Golang and GORM-based backend for dynamic reporting
○ Real-Time Transaction Processing and Data Integration: Employed Golang to process and integrate millions of transactional
events in real-time, ensuring instant calculations and seamless data flow from PubSub into PostgreSQL and Redis.
○ Developed a Golang library for geometry validation, supporting various shapes such as multipolygons, features, and feature
collections.
○ Legacy Support, Enhancements & Monitoring:
○ Maintained and enhanced legacy batch pipelines on Apache Beam
○ Grafana setup and created metric in grafana (LogQL, StackDriver, Prometheus)
○ Slack integration with grafana and airflow.
○ Gitlab CI/CD , Terraform
Tech Stack: GCP, BigQuery, Bigtable, Spark,PubSub,Kubernetes, Python, Golang, Airflow ,Qgis, Mapbox, PubSub-Lite, GitLab
CI,Terraform, Postgres, Redis
Tech Stack: GCP, BigQuery, AWS, Lambda, Kinesis, SQS, CloudWatch, Kubernetes, Python
Tech Stack: Python , Amazon S3, Azure Blob, Cosmos DB, MongoDB, Kafka, Kubernetes, PySpark
Education
University Of Kalyani, West Bengal
Bachelor of Technology in 2016 Information Technology
2016
Awards: Awarded as Employee of the Month 2017, Awarded as Employee of the Quarter Q2 ‘18, On-Site Opportunities - Mexico for AT&T and
Manila for PLDT.