Professional Documents
Culture Documents
GCP Cloud
GCP Cloud
GCP Cloud
Job description
Responsibilities:
• Strong development skills in Python.
• Writing effective and scalable Python codes.
• Strong experience in processing data and drawing insights from large data sets
• Good familiarity with one or more libraries: pandas, NumPy, SciPy etc.
• In-depth knowledge of spaCy and similar NLP libraries like NLTK, textacy etc.
• Experience with Python development environments, including, but not limited to Jupyter,
Google Colab notebooks, Matplotlib, Plotly, and geoplotlib.
• Advanced working SQL knowledge and experience working with relational databases, query authoring
(SQL) as well as working familiarity with a variety of databases.
• Experience performing root cause analysis on internal and external data and processes to answer specific
business questions and identify opportunities for improvement.
• Strong analytic skills related to working with unstructured datasets.
Good to have some exposure to
• Experience with setting up and maintaining Data warehouse (Google BigQuery, Redshift, Snowflake)
and Data Lakes (GCS, AWS S3 etc.) for an organization
• Experience with relational SQL and NoSQL databases, including Postgres and Cassandra / MongoDB.
• Experience with data pipeline and workflow management tools: Airflow, Dataflow, Dataproc, etc.
• Exposure to any Business Intelligence (BI) tools like Tableau, Dundas, Power BI etc.
• Agile software development methodologies.
• Working in multi-functional, multi-location teams
4+ years experience developing Big Data & Analytic solutions
Experience building data lake solutions leveraging Google Data Products (e.g. Dataproc, AI
Building Blocks, Looker, Cloud Data Fusion, Dataprep, etc.), Hive, Spark
Experience with relational SQL/No SQL
Experience with Spark (Scala/Python/Java) and Kafka
Work experience with using Databricks (Data Engineering and Delta Lake components)
Experience with source control tools such as GitHub and related dev process
Experience with workflow scheduling tools such as Airflow
In-depth knowledge of any scalable cloud vendor(GCP preferred)
Has a passion for data solutions
Strong understanding of data structures and algorithms
Strong understanding of solution and technical design
Has a strong problem solving and analytical mindset
Experience working with Agile Teams.
Able to influence and communicate effectively, both verbally and written, with team members and
business stakeholders
Able to quickly pick up new programming languages, technologies, and frameworks
Bachelors Degree in computer science
4+ Years of Experience in Data Engineering and building and maintaining large-scale data
pipelines.
Experience with designing and implementing a large-scale Data-Lake on Cloud Infrastructure
Strong technical expertise in Python and SQL and Shell Scripting.
Extremely well-versed in Google Compute Platform including BigQuery, Cloud Storage, Cloud
Composer, DataProc, Dataflow, Pub/Sub.
Experience with Big Data Tools such as Hadoop and Apache Spark (Pyspark)
Experience Developing DAGs in Apache Airflow 1.10.x or 2. x
Good Problem-Solving Skills
Detail Oriented
Strong Analytical skills working with a large store of Databases and Tables
Ability to work with geographically diverse teams.
Good to Have:
Certification in GCP service.
Experience with Kubernetes.
Experience with Docker
Experience with CircleCI for Deployment
Experience with Great Expectations.
Responsibilities:
Build Data and ETL pipelines in GCP.
Support migration of data to the cloud using Big Data Technologies like Spark, Hive, Talend, Java
Interact with customers on daily basis to ensure smooth engagement.
Responsible for timely and quality deliveries.
Fulfill organization responsibilities – Sharing knowledge and experience with the other groups in
the organization, and conducting various technical training sessions.
Job description
Greeting from HCL!!
We are looking for GCP-Data Engineer for Chennai Location
Exp:4+ yrs
Skills:
Hands-on experience in Google Cloud (BigQuery)
Strong SQL programming knowledge and hands-on experience on real-time projects.
Good data analysis and problem solving skills
Good communication skills and a quick learner
Candidates with BE/BTech/MCA / MSC having the required experience .Tech stack: BigQuery , any ETL
tool (Informatica, Talend, DataStage), Dataflow, Dataproc
• 3-5 years Experience in Data warehouse and Data lake implementation
• 1-2 years of experience in Google Cloud Platform (especially Big Query).
• 1-2 years of working experience in converting ETL jobs( in Informatica/Talend,/DataStage) into
Dataflow or Dataproc and migrated in CI\CD pipeline
• Design, develop and deliver data integration/data extraction solutions using IBM DataStage or other ETL
tools and Data Warehouse platforms like Teradata, BigQuery.
• Proficiency in Linux/Unix shell scripting and SQL.
• Knowledge of data Modelling, database design, and the data warehousing ecosystem.
• Ability to troubleshoot and solve complex technical problems.
• Excellent analytical and problem-solving skills.
• Knowledge of working in Agile environments.
Essential Skills
3+ years’ experience in developing large scale data pipelines in a at least one Cloud Services-
Azure, AWS, GCP
Expertise in one or more (data base + ETL /pipeline + Visualization Reporting) of the following
skills, Azure: Synapse, ADF, HD Insights,AWS: Redshift, Glue, EMR
Highly Proficient in any or more of market leading ETL tools like Informatica, DataStage, SSIS,
Talend, etc.,
Fundamental knowledge in Data warehouse/Data Mart architecture and modelling
Define and develop data ingest, validation, and transform pipelines.
Fundamental knowledge of distributed data processing and storage
Fundamental knowledge of working with structured, unstructured, and semi structured data
For cloud data engineer, experience with ETL/ELT patterns, preferably using Azure Data
Factory and Databricks jobs
Nice to have on premise platform understanding covering one or
more of the below skills Teradata, Cloudera, Netezza, Informatica, DataStage, SSIS, BODS, SAS,
Business Objects, Cognos, MicroStrategy, WebFocus, Crystal
Essential Qualification
BE/Btech in Computer Science, Engineering or relevant field
Responsibilities:
1. Data Migration: Collaborate with cross-functional teams to migrate data from various sources to
GCP. Develop and implement efficient data migration strategies, ensuring data integrity and security
throughout the process.
2. Data Pipeline Development: Design, develop, and maintain robust data pipelines that extract,
transform, and load (ETL) data from different sources into GCP. Implement data quality checks and
ensure scalability, reliability, and performance of the pipelines.
3. Data Management: Build and maintain data models and schemas in GCP, ensuring optimal storage,
organization, and accessibility of data. Collaborate with data scientists and analysts to understand
their data requirements and provide solutions to meet their needs.
4. GCP Data Service Expertise: Utilize your deep understanding of GCP data services, including
BigQuery, Big Data, Data Proc, and other relevant services, to architect and implement efficient and
scalable data solutions. Stay up to date with the latest advancements in GCP data services and
recommend innovative approaches to leverage them.
5. Performance Optimization: Identify and resolve performance bottlenecks within the data pipelines
and GCP data services. Optimize queries, job configurations, and data processing techniques to
improve overall system efficiency.
6. Data Governance and Security: Implement data governance policies, access controls, and data
security measures to ensure compliance with regulatory requirements and protect sensitive data.
Monitor and troubleshoot data-related issues, ensuring high availability and reliability of data
systems.