Arnab Paul

Arnab Paul
GCP Data Engineer

512-6409209
arnabp.2712@gmail.com
Data Engineer with 7+ years of experience as Data Engineer, Data Analyst and Python Developer. Proficient
in designing, documenting, development, and implementation of data models for enterprise-level
applications. Background in Data Lake, Data Warehousing, Data Mart, Data modelling, ETL Data pipeline &
Data Visualization. Proficient in Big data storage, processing, analysis, and reporting on all major Cloud
vendors- AWS, Azure and GCP.
 Experience in Big Data ecosystems using Hadoop, MapReduce, YARN, HDFS, HBase, HIVE, PIG, Sqoop,
Storm, Spark, Scala, Airflow, Flume, Kafka, Oozie, Impala, HBase and Zookeeper.
 Experience developing Spark applications using Spark Core, Streaming’s, Data Frames,
Datasets&Spark-ML. Developed Spark Streaming jobs by developing RDD’s using Scala, Spark and
Spark-Shell.
 In-depth understanding/knowledge of Hadoop Architecture and components including HDFS, Job
Tracker, Task Tracker, Name Node, Data Node and MapReduce. Worked in Design, Implementation,
Deployment and Maintenance of end-to-end Hadoop based analytical solutions.
 Experienced inhive Queries to process large sets of structured, semi-structured & unstructured data.
Experience in loading data into HDFS using Sqoop as well as saving data in Hive tables.
 Involved in end-to-end implementation of EnterpriseDataWarehouse, Data Lakes &Data Mart with
Batch and Real-time processing using Spark streaming, Kafka, Flume and Sqoop.
 Experience using Pig scripts for data transformations, event joins, filters, and pre-aggregations.
 Experience in setting up workflow using Apache Airflow and Oozie to manage & schedule Hadoop jobs.
 Experience in ETL using Informatica DEI/BDM, PowerCenter, DataStage & IICS tools.
 Experience in configuration, deployment&automation of in major Cloud environments (AWS, Azure &
GCP).
 Experience in deployment and testing infrastructure within AWS, using tools like Jenkins, Puppet and
Docker.
 Involved in setting up JenkinsMaster and multiple slaves as a CI tool as part of Continuous
Deployment.
 Worked with AWSEC2 cloud instance. Used EMR, Redshift, and Glue for data processing.
 Worked with AWS storage, OLTP, NoSQL & Datawarehouse- S3, RDS, DynamoDB & RedShift.
 Proficient in AWS CodePipeline and worked with Code Commit, Code Build &Code Deploy.
 Worked on creating IAM policies for delegated administration within AWS and Configured IAM Users /
Roles / Policies to grant fine - grained access to AWS resources.
 Hands on experience with Microsoft Azure Cloud services, Storage Accounts and Virtual Networks.
Worked on Security in Web Applications using Azure and deployed Web Applications to Azure .
 Experience in GCP platform- compute engine, cloud load balancing, cloud storage, database (Cloud
SQL, Bigtable, Cloud Datastore), stack driver monitoring and cloud deployment manager.
 Experience in data preprocessing (data cleaning, data integration, data reduction, and data
transformation) using Python libraries including NumPy, SciPy and Pandas for data analysis and
numerical computations.
 Experience working on various file formats including delimited text files, clickstream log files, Apache
log files, Parquet files, Avro files, JSON files, XML files and others.
 Experienced in analyzing data using Spark SQL, HIVEQL, Spark Scala and MapReduce programs in Java.
 Experience in core Java, Scala, and RESTful APIs. Used APIs to integrate various applications to servers.
 Experienced in working with UNIX/LINUX environments & writing Shell scripts on various UNIX
distributions.
 Worked with Spark eco system with SCALA and HIVE Queries on data formats like Text file, XML files,
Parquet.
 Experience in generating on-demand and scheduled reports for business analysis or management decision
using SSRS, Tableau, QlikView, POWER BI as well as Python libraries like Seaborn, matplotlib, plotly,
ggplot, etc.
 Worked with Tool Nagios for Resource Monitoring/Network Monitoring/Log Trace Monitoring.
 Experience working with DNS, HTTP, Tomcat, NFS, Proxy servers (Squid), NAT, Apache Web Server,
DNS Server (BIND), FTP&SFTP Server with health monitoring tools (Cloud Watch, Solar Winds, Logic
Monitor).
 Good understanding of data modeling (Dimensional Relational) concepts like Star-Schema Modeling,
Snowflake Schema Modeling, Fact and Dimension tables.
 Experience in MSSQL Database Analysis, Design, Development & Support(2008R2, 2012, 2014, 2016).
 Workedon SQL Server Log shipping, Database Mirroring, snapshot/transactional/Merge, SSIS&SSRS.
 Experience in Oracle Database Administration, System Analysis, Design, Development, Maintenance
and Support (8i/9i/10g/11g) . Performed Query Optimization, Performance Tuning, and
Troubleshooting.
 Experience working on MySQL on Linux and Windows environments. Performed MySQL Replication setup
and administration on Master-Slave and Master-Master.
 Proficient with all major PostgreSQL procedural languages (PL/PgSQL, PL/Perl, PL/PgPython, PL/Tcl).
 Worked on PostgreSQL Streaming Replication and Pgpool for load balancing & monitoring tools for
better performance like PgBadger, Kibana, Graphana, and Nagios.
 Experience in NoSQL database. Worked with Snowflake, HBase, Couchbase, Cassandra and MongoDB.
 Worked with Version Control Systems- Code Commit, GIT/GitHub, Cloud Source Repository
&Bitbucket.
 Worked on installing & configuring Jfrog. Integrated with environments of Artifactory instances.
 Experience in using Docker and setting up ELK with Docker and Docker-Compose. Knowledge on various
Docker components like Docker Engine, Hub, Machine, Compose and Docker Registry.
 Experience on Docker & Kubernetes based container deployments to create environments for
development teams as well as managing delivery for releases with continuous load management.
 Experienced in using Jenkins as continuous integration tool for creating new jobs, managing plugins,
configuring jobs selecting required source code management tool, build trigger, build system, and
postv build actions, scheduled automatic builds, notifying the build report.
 Experienced in configuration and automation with Ruby Script using Chef with Jenkins and Docker.
 Hands on experience in installing and administrating CI tools like Jenkins, Bamboo, TeamCity and other
tools
 like SonarQube, Nexus, GitHub, JIRA, Atlassian stack of tools like, Confluence, Bit-Bucket, etc.
 Experience in Docker Installation with Docker toolbox. Created Docker images, tagging and pushing
images.
 Worked in TERADATA Database design, implementation & maintenance in large scale Data Warehouse.
Proficient in TERADATA SQL, Stored Procedures, Macros, Views, Indexes Primary, PPI & Join indexes.
 Understanding of both traditional statistical modeling and Machine Learning techniques and algorithms
like Linear & Logistic Regression, Naïve Bayes, king, SVM, clustering, assembling (random forest,
gradient boosting), deep learning (neural networks), etc.
TECHNICAL SKILLS
ETL Tools AWS Glue, Azure Data Factory, GCP Data Fusion &Dataflow, Airflow, Spark,
Sqoop, Flume, Apache Kafka, Spark Streaming, Apache Knife, Microsoft
SSIS, Informatica PowerCenter & IICS, IBM DataStage
NoSQL Databases MongoDB, Cassandra, Amazon DynamoDB, HBase, GCP Datastore
Data Warehouse AWS RedShift, Google Cloud Storage, Snowflake, Teradata, Azure Synapse
SQL Databases Oracle DB, Microsoft SQL Server, IBM DB2, PostgreSQL, Teradata, Azure SQL
Database, Amazon RDS, GCP Clouds, GCP Cloud Spanner
Hadoop Distribution Cloudera, Hortonworks, Map, AWS EMR, Azure HDInsight, GCP Daturic
Hadoop Tools HDFS, HBase, Hive, YARN, Mar Reduce, Pig, HIVE, Apache Storm, Sqoop,
Oozie, Zookeeper, Spark, SOLR, Atlas
Programming & Scripting Spark Scala, Python, Java, MySQL, PostgreSQL, Shell Scripting, Pig Latin,
HiveQL
Visualization Tableau, Looker, Quick Sight, QlikView, Powerbase, Grafana, Python
Libraries
AWS EC2, S3, Glacier, Redshift, RDS, EMR, Lambda, Glue, CloudWatch,
Recognition, Kinesis, CloudFront, Route53, DynamoDB, Code Pipeline, EKS,
Athena, Quick Sight
Azure DevOps, Synapse Analytics, Data Lake Analytics, Databricks, Blob Storage,
Azure Data Factory, SQL Database, SQL Data Warehouse, Cosmos DB
Google Cloud Platform Compute Engine, Cloud Storage, Cloud SQL, Cloud Data Store, Big Query,
Pub/Sub, Dataflow, Daturic, Data Fusion, Data Catalog, Cloud Spanner,
Atom
Web Development HTML, XML, JSON, CSS, JQUERY, JavaScript
Monitoring Tools Splunk, Chef, Nagios, ELK
Source Code Management Frog Artifactory, Nexus, GitHub, Code Commit
Containerization Docker & Docker Hub, Kubernetes, OpenShift
Build & Development Tools Jenkins, Maven, Gradle, Bamboo
Methodologies Agile/Scrum, Waterfall
PROFESSIONALEXPERIENCE
Data Engineer at CarMax | Richmond, Virginia Jan 2021 – Jul 2021
CarMax specializes in used-vehicle retail in the US. I joined their inventory management team and as a Data
Engineer, I work with on-premises Hadoop data infrastructure, AWS cloud architecture & cloud migration to
AWS.
Responsibilities-
 Designed, and build scalable distributed data solutions using with Esplanade migration plan for existing
on-premises Cloudera Hadoop distribution to AWSbased on business requirement.
 Worked with legacy on-premises VMs based on UNIX distributions. Worked with batch data as well as
3rd Party data through FTP. Configured traditional ETL tools- Informatica, Party data.
 Worked wights that stores distributed data. Configured Oozie along with Sqoop to distributed data.
 distributed data Sqoop/ distributed data graphical interpretation, encryption, graphical
interpretation. Implemented data transfer to data transfer using Knife.
 Wrote YML files for Kafka Producers for ingesting streaming data. Assigned partitions to customers.
 Developed Scala scripts using both Data frames/SQL/Datasets and RDD/MapReduce in Spark for Data
Aggregation, queries and writing data back into OLTP system through Sqoop.
 Set up static and dynamic resource pools using YARN in Cloudera Manager for job scheduling &cluster
resource management. Used Zookeeper for configuration management, synchronization & other
services.
 Perform data profiling, modeling and Metadata Management tasks on complex data integration
scenarios adhering to Enterprise Data governance and Data Integration standards using Apache Atlas.
 Developed the core search module using Apachito and customized the Apachito for handling fallback
searching and to provide custom functions. Worked with big data tools to integrate Apachito search.
 Worked with Snowflake to integrate Powerbase, Apachito for dashboards visualizations.
 Utilized Apache Spark ML &Milab with Python to develop and execute Milab.
 Part of On-premises Hadoop infrastructure to AWS EMR Migration (Refactoring) Team.
 Built data pipelines for data governance at real-time & batch processing using Lambda/Kappa
architecture.
 Worked withS3DistCp tool to copy data from relational HDFS data to S3 buckets. Created a custom
utility tool in Hive that targets and deletes backed up folders that are manually flagged.
 Implemented auto-migration from MongoDB server in JSON format to tool in. Created Replication
server, defined endpoints and defined endpoints.
 Created S3 &EMR endpoints using Private Link& used AWS Private Subnet Network for fast transfers.
 Used Spark scripts implemented on EMR to automate, compare &validateS3 files to the original HDFS
files.
 Developed Spark custom framework to load the data from AWS S3 to Redshift for data warehousing.
 Experience in Jenkins for deployment of project. Help deploy projects on Jenkins using GIT. Used
Docker to achieve delivery goal on scalable environment & used Kubernetes for orchestration
automation.
 Integrated Apache Airflow and wrote scripts to automate workflows in automate workflows.
 Created Restful Pausing Flask to integrate functionalities & communicate with other applications.
 Integrated Teradata Warehouse into EMR cluster. Developed BTEQ scripts to load data from Teradata
Staging area to Teradata DataMart. Handled Error &tuned performance in Teradata queries and
utilities.
 Used Teradata queries to defines all settings- provision hardware, define security, & set up elements
for an EMR cluster. Wrote Infrastructure management as code, checked in and managed with source
control.
 Helped develop a source control oncogenomic, Code Build and Code Deploy with CloudFormation.
 Implemented Wathena as a replacement for Hive query engine. Migrated Analysis, end-users, and
other processes (automated Tableau and other dashboard tools) to query S3 directly.
 Integrated GIT into Jenkins to automate the code check-out process. Used Jenkins for automating
Builds and
 Automating Deployments.
 Designed and Automating Deployments strategies using (CI/CD) Pipelines and Automating Deployments
with remote execution. Ensured zero downtime using blue/green deployment strategy and shortened
deployment cycles through Jenkins’s automation.
 Worked in all areas of Jenkins-setting up CI for new branches, build automation, plugin management
and securing Jenkins and setting up master/slave configurations.
 Worked on IAM, KMS, Secrets, Config, Systems Manager and others for security and access
management.
 Implemented AWS RDS to store relational data & integrated it along with Elasticate load balancing.
 Implemented Kinesis on EMR for streaming analysis. Implemented existing clickstream Spark jobs to
Kinesis.
 Used Wagle as the new ETL tool. Used Glue to catalog with crawler to get the data from S3 and perform
Slurry operations. Implemented Allscripts on Glue for data transformation, validation, and data
cleansing.
 Used DNS management in Route53& configured CloudFront for access to media files (images). Worked
with AWS Recognition for real-time content filtering. Wrote Lambda functions to resize/scale images.
 Worked closely with GCP team to provide support for data pipeline to to run Big Query/ Atom jobs.
 Worked with data cleansing such as Athena, data cleansing. data cleansing for dashboards.
Environment: Hadoop (HDFS, HBase, Hive, YARN, MapReduce, Pig, HIVE, Apache Storm, Sqoop, Oozie,
Zookeeper, Spark, SOLR, Atlas), AWS (EC2, S3, Redshift, RDS, EMR, Lambda, Glue, CloudWatch, Recognition,
Kinesis, CloudFront, Route53, DynamoDB, Code Pipeline, Athena, Quick Sight), Python, MongoDB, Cassandra,
Snowflake, Airflow, Tableau
Data Engineer at CapitalOne | Richmond, Virginia Jan 2020 – Jan 2021

Capital One Financial Corporation is an American bank holding company that specializes in credit cards, auto
loans, banking, and savings accounts. I managed the data architecture of a critical project. I worked with
the on-premises architecture and adjacent tools, while also handling web and application servers deployed
on Azure. I also worked with their Sales Team, where I designed, build & maintained their GCP Architecture.
Responsibilities-
 Handled importing of data from various data sources, performed transformations using Hive
MapReduce. Loaded data into HDFS and extracted the data from MySQL into HDFS using Sqoop.
 Used YARN for scheduling and managing resources. Worked to finetune Resource Manager (using
Reservation System and RDL) & monitored Node Managers for optimum performance of on-premises
servers. Also worked with YARN Federation to integrate multiple YARN network for smooth
communication with multiple teams.
 Wrote MapReduce scripts for applications that handled data storage MapReduce scripts. Used Hadoop
Streaming to create jobs using UNIX Shell Scripts & integrated Hadoop Pipes with custom Hadoop
Applications. Performed MapReduce chains with dependencies using Job Control, Job Control.
 Used Tez to transform MapReduce chained jobs to simplify workflow & improve resource management.
 Integrated codes written using Scala, Java &Python on Spark for workloads like Integrated codes.
 Used Sparks to load JSON data and create Schema RDD and loaded it into Hive Tables & Cassandra.
 Worked to design tables in Hive using SQOOP. Implemented data processing on large datasets of
different forms including structured, semi-structured and unstructured data and loaded the data in
HDFS.
 Managed data from different data sources, performed transformation using Hive, Pig and MapReduce
and loaded data in HDFS. Integrated Java codes with Python module in Pig to perform text
preprocessing.
 USADA’s to design workflow of Hive, MapReduce and Pig jobs& implemented them with Oozie.
Monitored recurring as well as stream Ozie jobs and resolved issues with resource availability.
 Exported the analyzed data to database using Sqoop for visualization and to generate reports for the BI
team.
 Implemented Avro to serialize DataStream and reduce processing time. Used Flume to collect,
aggregate, and store the web log data from sources like web servers, mobile and network devices and
pushed to HDFS.
 Used Zookeeper to track cluster health, and handle coordination, leader election & fault tolerance.
Used Ambari for health metrics upgrading clusters using Rolling Upgrade (Ambari) and Manual Upgrade
(Command line).
 Used Apache Storm to build reliable real-time data pipelines to process unbounded streams of data
generated from OLTP applications (mobile & web apps) and used HBase to save & process streaming
data.
 Build Cassandra queries for performing CRUD operations. Used Bootstrap to organize HTML page layout.
 Created multiple data processing tasks using Spark that included reading data from external sources,
merge data, perform data enrichment and load in to target data destinations.
 Wrote Kafka producers to stream data from external RESTAPIs to Kafkatopics. Wrote Spark-Streaming
applications to consume the data from Kafka topics and write the processed streams to HBase.
 Worked with batches tools (Informatic PowerCenter, Microsoft’s) to extract & transform data and store
the data inOracleDB2. Understanding of other tools like IBM DataStage, IBM DataStage.
 Involved in installation, configuration, and deployment of Oracle DB2. Proficient in managing objects,
users, privileges, and roles and experienced in creating, configuring, and IBM DataStage.
 Designed & implemented on-Premises Hadoop distribution migration to Azure (Rehosting & Replat
forming). Participated in designing strategy& expected outcomes, planning skill readiness and adoption
plan, working with Azure to determine optimal strategy &appropriate tools, and finally moving to the
cloud.
 Processes and moves data between different compute and storage services, as well as on-premises
data sources at specified intervals. Coordinated to create, schedule, orchestrate, and manage data
pipelines. Used AZURE Data Factory to process and move Azure & scheduled on-premises data between
compute and storage.
 Used Azure Data Lake Gen2for storage on large scale & implemented appropriate tools for processing
data.
 Implemented Hadoop on-premises applications to Azure using AZURE HDInsight. Also deployed Spark
applications (Python, Scala, Java & SQL) using Spark environment implemented in Data Bricks.
 Automated an ELT workflow in Azure using Data Bricks with Data Bricks using Synapse Pipeline. Used
Cosmos DB to store & replicate data for web applications across different datacenters in the US.
 Worked Powerbase for custom visuals, presentations, and real-time dashboards for stream data.
 Worked with GCP and its various services- Pub Sub, Cloud Storage, Cloud Storage, Big Query, among
others.
 Helped deploy Daturic Service that can run Cloud Storage& integrated cloud storage to the clusters.
 Used Google Cron Service to build a task orchestrator on GCP that can schedule jobs in data pipeline.
 Used GCP Pub/Sub for ingestion from data pipeline to servers with Dataflow.
 Used Dataflow to ingest batch data, worked on Dataflow with Apache Beam Unified Batch for
validation.
 Worked with Clouds for relational database services and extensively worked with MySQL scripts. Also
configured MySQL scripts and connected Pub/Sub streaming with Dataflow for ingestion from other
servers.
 Wrote a program to downloads dump and load downloads for Data Lake that pulled information from
servers to perform Big Query tasks. Data Lake based configurable framework to connect common Data
sources like MYSQL, Oracle, Postgres, Cleveland loads it in big query.
 Built dashboards in Tableau and Looker with ODBC connections from big.
 big, Datapost and cloud Dataflow jobs via Stack driver for all environments.
 Coordinated with team and developed framework to generate Daily ado reports and extract data from
various enterprise servers using Big Query.
Environment: Hadoop (HDFS, HBase, Hive, YARN, MapReduce, Pig, HIVE, Storm, Sqoop, Oozie, Zookeeper,
Spark), Azure (VMs, Synapse Analytics, Blob, Databricks, Data Factory, SQL Database, SQL Datawarehouse,
Cosmos DB), GCP (Compute Engine, Cloud Storage, Cloud SQL, Data Store, Big Query, Pub/Sub, Dataflow,
Data Fusion, Daturic)
Data Engineer at CapitalOne | Capgemini India | Mumbai, India Jun 2016 – July 2019
Capgemini SE is an IT Services & Consulting company with diverse clients from all over the globe. I worked
for CapitalOne where I was a part of the team that handled the data architecture. I worked with Python,
Java Applications, Hadoop & a variety of ETL tools including Informatica& Apache Sqoop/Flume.
Responsibilities:
 Analyzed, designed, and build scalable distributed data solutions using with Hadoop, AWS&GCP.
 Worked on multi-tier applications using AWS services (EC2, Route53, S3, RDS, DynamoDB, SNS, SQS,
IAM) focusing on high-availability, fault tolerance, and auto-scaling in Teradata queries.
 Participated in documenting Teradata queries for smooth transfer of project from development to
testing environment and then moving the code to production.
 Worked to implement persistent storage in AWS using Teradata queries, S3, Glacier. Created Volumes
and configured Snapshots for EC2 instances. Also, built, and managed Hooper clusters on AWS.
 Used Hooper in Scala to convert distributed data into named columns& helped develop Predictive
Analytics using Predictive Analytics.
 Developed Scala scripts using both Data frames/SQL/Datasets and RDD/MapReduce in Spark for Data
Aggregation, queries and writing data back into OLTP system through Sqoop.
 Developed Hive queries to pre-process the data required for running business processes.
 Implemented multiple generalized solution model using AWS Sage Maker.
 Extensive expertise using the core Sparaxis and processing data on an EMR cluster.
 Worked on Hive queries and Sparaxis to create HBase tables to load large sets of structured, semi-
structured and unstructured data coming from UNIX, NoSQL databases, and a variety of portfolios.
 Worked on ETL Migration services by developing and deploying Alameda functions for generating an
Alameda which can be written to Glue Catalog and can be queried from Athena.
 Programmed in Hive, Sparks, Java, and Python to streamline & orchestrate the incoming data and build
data pipelines to get the useful insights.
 Loaded data into Sparked and in-memory data computation to generate the output response stored
datasets into HDFS/ AmazonS3 storage/ relational databases.
 Migrated Legacy Informatic batch/real time ETL logical code to Hadoop using Python, Spark Context,
Spark-SQL, Data Frames and Laird’s in Data Bricks.
 Experienced in handling large datasets using partitions, Spark in-memory capabilities, Broadcasts in
Spark, effective & efficient Joins, Transformation and other during ingestion process itself.
 Worked on tuning Spark applications to set Batch Interval time, level of Parallelism and memory
tuning.
 Implemented near-real time data processing using Stream Sets and Spark/Databricks framework.
 Stored Spark Datasets into Snowflake relational databases & used data for Analytics reports.
 Migrated SQL Server Database into multi cluster Snowflake environment and created data sharing
multiple applications and created Stream Sets based on data volume/ Jobs.
 Developed Apache Spark jobs using Python in the test environment for faster data processing and used
Sparks for querying.
 Used Sparks container for validating data load for test/ dev-environments.
 Worked on Metapipeline to source tables and to deliver calculated ratio data from AWS to Datamart
(SQL Server) & Credit Edge server.
 Worked in tuning relational databases (Microsoft SQL Server, Oracle, MySQL, PostgreSQL) and columnar
databases (Amazon Redshift, Microsoft SQL Data Warehouse).
 Hands-on experience in Amazon EC2, Amazon S3, Amazon RedShift, Amazon EMR, Amazon RDS, Amazon
ELB, Amazon CloudFormation, and other services of the AWS family.
 Developed job processing scripts using Oozie workflow. Experienced in scheduling& job management.
 Wrote Jenkins Groovy script for automating workflow including, ingesting, ETL and reporting to
dashboards.
 Worked in development of scheduled jobs using with commands/BASH Shell in UNIX
Environment: Hadoop (Hive, Sqoop, Pig), AWS (EC2, S3, RedShift, EMR, EBS), Java, Django, Python, Flask,
XML, MySQL, MSSQL Server, Shell Scripting, MongoDB, Python 3.3, Django, Cassandra, Docker, Jenkins, JIRA,
jQuery
Python Developer at Juniper Networks | Tech Mahindra | Hyderabad, India June 2014 – May 2016
HDFC Bank leading private sector bank in India which offers wide range of banking products and financial
services like investment banking, life insurance and asset management. Involved in Python web application
development (both front-end & backend), database management, testing and deployment. Also worked with
UNIX Bash Shell Scripting for job scheduling & maintenance.
Responsibilities:
 Created APIs, Database Model and Views Utilization using Python to build responsive web page
application.
 Worked on a fully automated continuous integration system using Git, Gerrit, Jenkins, MySQL and in-
house tools developed in Python and Bash.
 Participated in the SDLC of a project including Design, Development, Deployment, Testing and
Support.
 Deployed& troubleshoot applications used as a data source for both customers and internal service team.
 Wrote and executed MySQL queries from Python using Python-MySQL connector and MySQL dB package.
 Implemented troubleshoot applications website development using CSS, HTML, JavaScript and jQuery.
 Worked on a Python/Django based web application with PostgreSQL DB and integrated with third party
email, messaging& storage services.
 Developed GUI using webapp2 for dynamically displaying the test block documentation and other
features of Python code using a web browser.
 Involved in design, implementation and modifying back-end Python code and MySQL database schema.
 Developed user friendly graphical representation of item catalogue configured for specific equipment.
 Used Beautiful Soup for web scrapping to extract data & generated various capacity planning reports
(graphical) using Python packages like NumPy, matplotlib.
 Automated different workflows, which are initiated manually with Python scripts and UNIX shell
scripting.
 Fetched Twitter feeds for certain important keyword using Twitter Python API.
 Used Shell Scripting for UNIX Jobs which included Job scheduling, batch-job scheduling, process control,
forking and cloning and checking status.
 Monitored Python scripts that are run as daemons on UNIX to collect trigger and feed arrival
information.
 Used JIRA for bug & issue tracking and added algorithms to application for data and address generation.
Environment:Python 2.7 (BeautifulSoup, NumPy, matplotlib), Web Development (CSS, HTML, JavaScript,
JQuery), Database (MySQL, PostgreSQL), UNIX/Linux Shell Script, JIRA, Jenkins, GIT.
Education:
2014,Bachlors of Computer science
2021, Masters of Data Science

Arnab Paul

Uploaded by

Copyright:

Available Formats

You might also like

Arnab Paul

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Arnab Paul

Uploaded by

Copyright:

Available Formats

Arnab Paul

GCP Data Engineer

Data Engineer at CapitalOne | Richmond, Virginia Jan 2020 – Jan 2021

2021, Masters of Data Science

You might also like