GCP Cloud

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

Job description

JD GCP Data Engineering


 Between 2 to 4 years' experience in GCP Data Engineering.
 Design, develop, and maintain data pipelines using GCP services.
 Strong data engineering experience using Python, PySpark programming languages or Spark on
Google Cloud.
 Should have worked on handling big data.
 Strong communication skills.
 experience in Agile methodologies ETL, ELT skills, Data movement skills, Data processing skills.
 Certification on Professional Google Cloud Data engineer will be an added advantage.
 Proven analytical skills and Problem-solving attitude.
 Ability to effectively function in a cross-team environment.
Primary Skill
 GCP, data engineering
 Python/PySpark, SQL Spark on GCP, Programming experience
 GCS (Cloud Storage), Composer (Airflow) and BigQuery experience
 experience building data pipelines using above skills.
 Pipeline development experience using Dataflow or Dataproc (Apache Beam etc)
 Experience in GCP services or databases like Cloud SQL, Datastore, Bigtable, Spanner, Cloud
Run, Cloud Functions etc
 Proven analytical skills and Problem-solving attitude.
 Excellent Communication Skills

Job description
Responsibilities:
• Strong development skills in Python.
• Writing effective and scalable Python codes.
• Strong experience in processing data and drawing insights from large data sets
• Good familiarity with one or more libraries: pandas, NumPy, SciPy etc.
• In-depth knowledge of spaCy and similar NLP libraries like NLTK, textacy etc.
• Experience with Python development environments, including, but not limited to Jupyter,
Google Colab notebooks, Matplotlib, Plotly, and geoplotlib.
• Advanced working SQL knowledge and experience working with relational databases, query authoring
(SQL) as well as working familiarity with a variety of databases.
• Experience performing root cause analysis on internal and external data and processes to answer specific
business questions and identify opportunities for improvement.
• Strong analytic skills related to working with unstructured datasets.
Good to have some exposure to
• Experience with setting up and maintaining Data warehouse (Google BigQuery, Redshift, Snowflake)
and Data Lakes (GCS, AWS S3 etc.) for an organization
• Experience with relational SQL and NoSQL databases, including Postgres and Cassandra / MongoDB.
• Experience with data pipeline and workflow management tools: Airflow, Dataflow, Dataproc, etc.
• Exposure to any Business Intelligence (BI) tools like Tableau, Dundas, Power BI etc.
• Agile software development methodologies.
• Working in multi-functional, multi-location teams
4+ years experience developing Big Data & Analytic solutions
 Experience building data lake solutions leveraging Google Data Products (e.g. Dataproc, AI
Building Blocks, Looker, Cloud Data Fusion, Dataprep, etc.), Hive, Spark
 Experience with relational SQL/No SQL
 Experience with Spark (Scala/Python/Java) and Kafka
 Work experience with using Databricks (Data Engineering and Delta Lake components)
 Experience with source control tools such as GitHub and related dev process
 Experience with workflow scheduling tools such as Airflow
 In-depth knowledge of any scalable cloud vendor(GCP preferred)
 Has a passion for data solutions
 Strong understanding of data structures and algorithms
 Strong understanding of solution and technical design
 Has a strong problem solving and analytical mindset
 Experience working with Agile Teams.
 Able to influence and communicate effectively, both verbally and written, with team members and
business stakeholders
 Able to quickly pick up new programming languages, technologies, and frameworks
 Bachelors Degree in computer science

 4+ Years of Experience in Data Engineering and building and maintaining large-scale data
pipelines.
 Experience with designing and implementing a large-scale Data-Lake on Cloud Infrastructure 
 Strong technical expertise in Python and SQL and Shell Scripting.
 Extremely well-versed in Google Compute Platform including BigQuery, Cloud Storage, Cloud
Composer, DataProc, Dataflow, Pub/Sub.
 Experience with Big Data Tools such as Hadoop and Apache Spark (Pyspark)
 Experience Developing DAGs in Apache Airflow 1.10.x or 2. x
 Good Problem-Solving Skills
 Detail Oriented
 Strong Analytical skills working with a large store of Databases and Tables
 Ability to work with geographically diverse teams.
Good to Have:
 Certification in GCP service. 
 Experience with Kubernetes.
 Experience with Docker
 Experience with CircleCI for Deployment
 Experience with Great Expectations.
Responsibilities:
 Build Data and ETL pipelines in GCP.
 Support migration of data to the cloud using Big Data Technologies like Spark, Hive, Talend, Java
 Interact with customers on daily basis to ensure smooth engagement.
 Responsible for timely and quality deliveries.
 Fulfill organization responsibilities – Sharing knowledge and experience with the other groups in
the organization, and conducting various technical training sessions.

Job description
Greeting from HCL!!
We are looking for GCP-Data Engineer for Chennai Location
Exp:4+ yrs
Skills:
Hands-on experience in Google Cloud (BigQuery)
Strong SQL programming knowledge and hands-on experience on real-time projects.
Good data analysis and problem solving skills
Good communication skills and a quick learner

If you are interested please share your resume to jyothiveerabh.akula@hcl.com

Roles and Responsibilities


In this Role, the GCP Data Engineer is responsible to:
 Design, develop, test, and implement technical solutions using GCP data technologies/tools.
 Develop data solutions in distributed microservices and full stack systems.
 Utilize programming languages like Python, Java and GCP technologies like BigQuery, Data
Proc, Data Flow, Cloud SQL, Cloud Functions, Cloud Run, Cloud Composer, Pub/Sub, APIs
 Lead performance engineering and ensure the systems are scalable.
Desired Candidate Profile Technology & Engineering Expertise
 Overall 5+ years of overall experience in implementing data solutions using Cloud/On-prem
technologies.
 At least 3+ years of experience in data pipeline development using GCP cloud technologies.
 Proficient in data ingestion, store and processing using GCP technologies like BigQuery, Data
Proc, Data Flow, Cloud SQL, Cloud Functions, Cloud Run, Cloud Composer, Pub/Sub and
APIs
 Proficient in pipeline development using ELT and ETL approaches.
 Experience in Microservices implementations on GCP
 Knowledge in Master data management
 Knowledge in Data Catalog, Data Governance, Data Security
 Excellent SQL skills
 Must be google certified.
 Experience with different development methodologies (RUP | Scrum | XP) Soft skills
Soft skills

Desired Candidate Profile


We are looking for a GCP Data Engineer for Full time/Part time.
Needs to be Very strong in Python,GCP,Data Flow, BigQuery,Data Processing,ETL,API

Candidates with BE/BTech/MCA / MSC having the required experience .Tech stack: BigQuery , any ETL
tool (Informatica, Talend, DataStage), Dataflow, Dataproc
• 3-5 years Experience in Data warehouse and Data lake implementation
• 1-2 years of experience in Google Cloud Platform (especially Big Query).
• 1-2 years of working experience in converting ETL jobs( in Informatica/Talend,/DataStage) into
Dataflow or Dataproc and migrated in CI\CD pipeline
• Design, develop and deliver data integration/data extraction solutions using IBM DataStage or other ETL
tools and Data Warehouse platforms like Teradata, BigQuery.
• Proficiency in Linux/Unix shell scripting and SQL.
• Knowledge of data Modelling, database design, and the data warehousing ecosystem.
• Ability to troubleshoot and solve complex technical problems.
• Excellent analytical and problem-solving skills.
• Knowledge of working in Agile environments.

Essential Skills
    3+ years’ experience in developing large scale data pipelines in a at least one Cloud Services-
Azure, AWS, GCP
    Expertise in one or more (data base + ETL /pipeline + Visualization Reporting) of the following
skills, Azure: Synapse, ADF, HD                              Insights,AWS: Redshift, Glue, EMR
    Highly Proficient in any or more of market leading ETL tools like Informatica, DataStage, SSIS,
Talend, etc.,
    Fundamental knowledge in Data warehouse/Data Mart architecture and modelling
    Define and develop data ingest, validation, and transform pipelines.
    Fundamental knowledge of distributed data processing and storage
    Fundamental knowledge of working with structured, unstructured, and  semi structured data
    For cloud data engineer, experience with ETL/ELT patterns, preferably using Azure Data
Factory and Databricks jobs
    Nice to have on premise platform understanding covering one or 
    more of the below skills Teradata, Cloudera, Netezza, Informatica, DataStage, SSIS, BODS, SAS,
Business Objects, Cognos, MicroStrategy,            WebFocus, Crystal
Essential Qualification
    BE/Btech in Computer Science, Engineering or relevant field

Responsibilities:

1. Data Migration: Collaborate with cross-functional teams to migrate data from various sources to
GCP. Develop and implement efficient data migration strategies, ensuring data integrity and security
throughout the process.

2. Data Pipeline Development: Design, develop, and maintain robust data pipelines that extract,
transform, and load (ETL) data from different sources into GCP. Implement data quality checks and
ensure scalability, reliability, and performance of the pipelines.

3. Data Management: Build and maintain data models and schemas in GCP, ensuring optimal storage,
organization, and accessibility of data. Collaborate with data scientists and analysts to understand
their data requirements and provide solutions to meet their needs.

4. GCP Data Service Expertise: Utilize your deep understanding of GCP data services, including
BigQuery, Big Data, Data Proc, and other relevant services, to architect and implement efficient and
scalable data solutions. Stay up to date with the latest advancements in GCP data services and
recommend innovative approaches to leverage them.

5. Performance Optimization: Identify and resolve performance bottlenecks within the data pipelines
and GCP data services. Optimize queries, job configurations, and data processing techniques to
improve overall system efficiency.

6. Data Governance and Security: Implement data governance policies, access controls, and data
security measures to ensure compliance with regulatory requirements and protect sensitive data.
Monitor and troubleshoot data-related issues, ensuring high availability and reliability of data
systems.

7. Documentation and Collaboration: Create comprehensive technical documentation, including data


flow diagrams, system architecture, and standard operating procedures. Collaborate with cross-
functional teams, including data scientists, analysts, and software engineers, to understand their
requirements and provide technical expertise

You might also like