Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

Santosh Goud

Email: santoshgoud2526@gmail.com
Contact: 737-372-2211.
Senior AWS Big Data Engineer
LinkedIn:

OVERALL SUMMARY:

 Overall 8+ years of professional experience in Information Technology expertise in BIGDATA using


HADOOP framework and Analysis, Design, Development, Testing, Documentation, Deployment and
Integration using SQL and Big Data technologies.
 Expertise in using major components of Hadoop ecosystem components like HDFS, YARN, MapReduce,
Hive,Impala, Pig, Sqoop, HBase, Spark, Spark SQL, Kafka, Spark Streaming, Flume, Oozie, Zookeeper,
Hue.
 Good Experience in implementing and orchestrating data pipelines using Oozie and Airflow.
 Worked with Cloudera and Hortonworks distributions.
 Expertise working with AWS cloud services like EMR, S3,Redshift, EMR cloud watch , for big data
development.
 Good working knowledge of  Amazon Web Services (AWS) Cloud Platform which includes services
like EC2, S3, VPC, ELB, IAM, DynamoDB, Cloud Front, Cloud Watch, Route 53, Elastic Beanstalk (EBS),
AutoScaling, Security Groups, EC2 Container Service (ECS), Code Commit, Code Pipeline, Code Build,
CodeDeploy, Dynamo DB, Auto Scaling, Security Groups, Red shift, CloudWatch, CloudFormation,
CloudTrail, Ops Works, Kinesis, IAM, SQS, SNS, SES.  
 Expert in building Enterprise Data Warehouse or Data warehouse appliances from Scratch using both
Kimball and Inmon’s Approach.
 Create Data pipelines for Kafka cluster and process the data by using sprk streaming
 Create Glue jobs in AWS and load incremental data to S3 staging area and persistence area.
 Worked on streaming data to consume data from KAFKA topics and load the data to landing area for
reporting in near real time
 Experience in developing enterprise level solution using batch processing (using Apache Pig) and
streaming framework (using Spark Streaming, apache Kafka & Apache Flink).
 Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second
on streaming data
 Experience in designing star schema, Snowflake schema for Data Warehouse, ODS architecture.
 Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems
and vice-versa and load into Hive tables, which are partitioned.
 Experience in developing custom UDFs for Pig and Hive to incorporate methods and functionality of
Python/Java into Pig Latin and HQL (HiveQL) and Used UDFs from Piggybank UDF Repository.
 Having good knowledge in writing MapReduce jobs through Pig, Hive, and Sqoop.
 Extensive knowledge in writing Hadoop jobs for data analysis as per the business requirements using Hive
and worked on HiveQL queries for required data extraction, join operations, writing custom UDF's as
required and having good experience in optimizing Hive Queries.
 Experience in Data Analysis, Data Profiling, Data Integration, Migration, Data governance and Metadata
Management, Master Data Management and Configuration Management.
 Experience in developing customized UDF’s in Python to extend Hive and Pig Latin functionality.
 Expertise in designing complex Mappings and have expertise in performance tuning and slowly changing
Dimension Tables and Fact tables
 Extensively worked with Teradata utilities Fast export, and Multi Load to export and load data to/from
different source systems including flat files.
 Good understanding of distributed systems, HDFS architecture, Internal working details of MapReduce and
Spark processing frameworks.
 Good experience in Oozie Framework and Automating daily import jobs.
 Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL
Solutions and Data Warehouse tools for reporting and data analysis.
 Develop data set processes for data modelling, and Data mining. Recommend ways to improve data
reliability, efficiency and quality.
 Extensive knowledge in various reporting objects like Facts, Attributes, Hierarchies, Transformations,
filters, prompts, calculated fields, Sets, Groups, Parameters etc., in Tableauexperience in working with
Flume and NiFi for loading log files into Hadoop.
 Experience in working with NoSQL databases like HBase and Cassandra.
 Experienced in creating shell scripts to push data loads from various sources from the edge nodes onto the
HDFS.
 Experienced in building Automation Regressing Scripts for validation of ETL process between multiple
databases like Oracle, SQL Server, Hive, and Mongo DB using Python. 
 Proficiency in SQL across several dialects (we commonly write MySQL, PostgreSQL, Redshift, SQL Server,
and Oracle)
 Strong experience in core Java, Scala, SQL, PL/SQL and Restful web services
 Collected logs data from various sources and integrated in to HDFS using Flume.
 Experience in developing customized UDF’s in Python to extend Hive and Pig Latin functionality.
 Expertise in designing complex Mappings and have expertise in performance tuning and slowly changing
Dimension Tables and Fact tables
 Experienced in building Automation Regressing Scripts for validation of ETL process between multiple
databases like Oracle, SQL Server, Hive, and Mongo DB using Python. 
 Experience in maintaining an Apache Tomcat MYSQL, LDAP, Web service environment.
 Designed ETL workflows on Tableau, Deployed data from various sources to HDFS.
 Good experience with use-case development, with Software methodologies like Agile and Waterfall.
 Well experience in Normalization and De-Normalization techniques for optimum performance in relational
and dimensional database environments.
 Good knowledge of Data Marts, OLAP, Dimensional Data Modeling with Ralph Kimball Methodology (Star
Schema Modeling, Snow-Flake Modeling for FACT and Dimensions Tables) using Analysis Services.
 Strong analytical and problem-solving skills and the ability to follow through with projects from inception
to completion.
 Ability to work effectively in cross-functional team environments, excellent communication, and
interpersonal skills.

TECHNICAL SKILLS:

 Big Data Tools: Hadoop Ecosystem: Map Reduce, Spark, Airflow, Nifi, HBase, Hive, Pig, Sqoop, Kafka,
Oozie, Hadoop
 Methodologies: RAD, JAD, System Development Life Cycle (SDLC), Agile
 Cloud Platform: AWS (Amazon Web Services), Microsoft Azure
 Cloud Management: Amazon Web Services (AWS)- EC2, EMR, S3, Redshift, EMR, Lambda, Athena
 Data Modeling Tools: Erwin Data Modeler, ER Studio v17
 Programming Languages: SQL, PL/SQL, and UNIX.
 OLAP Tools: Tableau, SSAS, Business Objects, and Crystal Reports 9
 Databases: Oracle 12c/11g, Teradata R15/R14.
 ETL/Data warehouse Tools: Informatica 9.6/9.1, and Tableau.
 Operating System: Windows, Unix, Sun Solaris

PROJECT EXPERIENCE:

Client: AT&T, Plano, TX Mar 2021- Present


Title: SeniorAWS Big Data Engineer
Responsibilities:

 Worked on Ingesting data by going through cleansing and transformations and leveraging AWS Lambda,
AWSGlue and StepFunctions
 Involved in conducting JAD sessions to identify the source systems and data needed by Actimize-SAM
(KYC/CIP).
 Assisted with FATCA testing using internal software to ensure that proper controls were in place for the
new regulation
 Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java and
Scala for data cleaning and preprocessing.
 Created a Lambda Deployment function, and configured it to receive events from S3 buckets
 Writing UNIX shell scripts to automate the jobs and scheduling cron jobs for job automation using
commands with Crontab.
 Developed various Mappings with the collection of all Sources, Targets, and Transformations using
Informatica Designer
 Installed and configured Hive and written Hive UDFs and Used Map Reduce and Junit for unit testing.
 Written the Map Reduce programs, Hive UDFs in Java.
 Developed Mappings using Transformations like Expression, Filter, Joiner and Lookups for better data
messaging and to migrate clean and consistent data
 Developed report layouts for Suspicious Activity and Pattern analysis under AML regulations 
 Prepared and analyzed AS IS and TO BE in the existing architecture and performed Gap Analysis. Created
workflow scenarios, designed new process flows and documented the Business Process and various
Business Scenarios and activities of the Business from the conceptual to procedural level.
 Migrate data from on-premises to AWS storage buckets
 Create Spark code to process streaming data from Kafka cluster and load the data to staging area for
processing.
 Create data pipelines to use for business reports and process streaming data by using Kafka on premise
cluster.
 Process the data from Kafka pipelines from topics and show the real time streaming in dashboards
 Developed a python script to transfer data from on-premises to AWS S3
 Developed a python script to hit REST API’s and extract data to AWS S3
 Analyzed business requirements and employed Unified Modeling Language (UML) to develop high-level
and low-level Use Cases, Activity Diagrams, Sequence Diagrams, Class Diagrams, Data-flow Diagrams,
Business Workflow Diagrams, Swim Lane Diagrams, using Rational Rose
 Worked with senior developers to implement ad-hoc and standard reports using Informatica, Cognos, MS
SSRS and SSAS.
 Joined various tables in Cassandra using spark and Scala and ran analytics on top of them.
 Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
 Thorough understanding of various modules of AML including Watch List Filtering, Suspicious Activity
Monitoring, CTR, CDD, and EDD.
 Performing ETL from multiple sources such as Kafka, NIFI, Teradata, DB2 using Hadoop spark.
 Used Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS using
Python and NoSQL databases such as HBase and Cassandra
 Collected data using Spark Streaming from AWS S3bucket in near-real-time and performs necessary
Transformations and Aggregation on the fly to build the common learner data model and persists the data
in HDFS.
 Developed Spark scripts by writing custom RDDs in Scala for data transformations and perform actions on
RDDs.
 Responsible for building scalable distributed data solutions using EMR cluster environment with Amazon
EMR.
 Developed Java Map Reduce programs for the analysis of sample log file stored in cluster.
 Used Apache NiFi to copy data from local file system to HDP.
 Worked on Dimensional and Relational Data Modeling using Star and Snowflake Schemas, OLTP/OLAP
system, Conceptual, Logical and Physical data modeling using Erwin.
 Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation and
aggregation from multiple file formats.
 Created yaml files for each data source and including glue table stack creation
 Worked on a python script to extract data from Netezza databases and transfer it to AWS S3
 Developed Lambda functions and assigned IAM roles to run python scripts along with various triggers
(SQS, Event Bridge, SNS)
 Developed highly complex Python and Scala code, which is maintainable, easy to use, and satisfies
application requirements, data processing and analytics using inbuilt libraries.
 Designed and implemented Sqoop for the incremental job to read data from DB2 and load to Hive tables
and connected to Tableau for generating interactive reports using Hive server2.
 Used Sqoop to channel data from different sources of HDFS and RDBMS.
 Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation and
aggregation from multiple file formats.
 Used Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS using
Python and NoSQL databases such as HBase and Cassandra
 Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary
Transformations and Aggregation on the fly to build the common learner data model and persists the data
in HDFS.
 Used Apache NiFi to copy data from local file system to HDP.
 Automated the data processing with Oozie to automate data loading into the Hadoop Distributed File
System.

Environment:AWS,Spark-SQL, Oracle12c, PL/SQL, Map Reduce, Hive, Impala, Scala, Erwin, Java, Bigdata, Hadoop,
PySpark, Python, kafka, SAS, MDM, Oozie, SSIS, T-SQL, ETL, HDFS, Cosmos, Pig, Sqoop, MS Access.

Client: DTE Energy, Detroit, MI Jun 2019- Feb 2021


Title: AWS Data Engineer
Responsibilities:
 Primarily Responsible for converting Manual Report system to fully automated CI/CD Data Pipeline that
ingest data from different Marketing platform to AWS S3 data lake.
 Designed AWS architecture, Cloud migration, AWS EMR, DynamoDB, Redshift and event processing using
lambda function
 Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on
Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3)
 Used AWS system manager to automate operational tasks across AWS resources.
 Utilized AWS services with focus on big data analytics, enterprise data warehouse and business
intelligence solutions to ensure optimal architecture, scalability, flexibility
 Involved in the development of real time streaming applications using PySpark, Apache Flink, Kafka, Hive
on distributed Hadoop Cluster.
 Experienced in Importing and exporting data into HDFS and Hive using Sqoop.
 Participated in development/implementation of Cloudera Hadoop environment.
 Experienced in running query-using Impala and used BI tools to run ad-hoc queries directly on Hadoop.
 Integrated Cassandra as a distributed persistent metadata store to provide metadata resolution for
network entities on the network
 Involved in various NOSQL databases like Hbase, Cassandra in implementing and integration.
 Created storage with Amazon S3 for storing data. Worked on transferring data from Kafka topic into AWS
S3 storage
 Responsible for building scalable distributed data solutions using EMR cluster environment with Amazon
EMR.
 Wrote Lambda function code and set Cloud Watch Event as trigger with Cron job Expression.
 Validate Scoop jobs, Shell scripts& perform data validation to check if data is loaded correctly without any
discrepancy. Perform migration and testing of static data and transaction data from one core system to
another.
 Built S3 buckets and managed policies for S3 buckets and used S3 bucket and Glacier for storage and
backup on AWS
 Developed Spark scripts by writing custom RDDs in Scala for data transformations and perform actions on
RDDs.
 Designed and developed Flink pipelines to consume streaming data from kafka and applied business logic
to massage and transform and serialize raw data.
 Created Metric tables, End user views in Snowflake to feed data for Tableau refresh.
 Generated Custom SQL to verify the dependency for the daily, Weekly, Monthly jobs.
 Using Nebula Metadata, registered Business and Technical Datasets for corresponding SQL scripts
 Experienced in working with spark ecosystem using Spark SQL and Scala queries on different formats like
text file, CSV file.
 Developed a data pipeline using Kafka and Storm to store data into HDFS.
 Implemented Kafka producers create custom partitions, configured brokers and implemented high level
consumers to implement data platform.
 Develop best practice, processes, and standards for effectively carrying out data migration activities. Work
across multiple functional projects to understand data usage and implications for data migration.
 Prepare data migration plans including migration risk, milestones, quality and business sign-off details.
 Oversee the migration process from a business perspective. Coordinate between leads, process manager
and project manager. Perform business validation of uploaded data.
 Worked on to retrieve the data from FS to S3 using spark commands
 Performed advanced procedures like text analytics and processing, using the in-memory computing
capabilities of Spark using Scala.
 Worked on migrating MapReduce programs into Spark transformations using Scala.
 Developed spark code and spark-SQL/streaming for faster testing and processing of data.
 Closely involved in scheduling Daily, Monthly jobs with Precondition/Postcondition based on the
requirement.
 Wrote Python modules to extract data from the MySQL source database.
 Worked on Cloudera distribution and deployed on AWS EC2 Instances.
 Migrated high avail webservers and databases to AWS EC2 and RDS with min or no downtime.
 Worked with AWS IAM to generate new accounts, assign roles and groups.
 Deployed the project on Amazon EMR with S3 connectivity for setting a backup storage.
 Created Jenkins jobs for CI/CD using git, Maven and Bash scripting
 Built regression test suite in CI/CD pipeline with Data setup, test case execution and tear down using
Cucumber- Gherkin, Java, Spring DAO, PostgreSQL
 Connected Redshift to Tableau for creating dynamic dashboard for analytics team.
 Setup connection between S3 to AWS Sage Maker ML (Machine Learning platform) is used for predictive
analytics and uploading inferenced data to redshift
 Deployed the project on Amazon EMR with S3 connectivity for setting backup storage.
 Conducted ETL Data Integration, Cleansing, and Transformations using AWS glue Spark script.

Environment: Redshift, DynamoDB,Pyspark, EC2, EMR, Glue, S3, Java, Kafka, IAM, PostgreSQL, Jenkins, Maven,
AWSCLI, Shell Scripting,Git.

Client: Scotia Bank, New York City, NY Feb 2017- May 2019
Title: Big Data Engineer
Responsibilities:

 Participated in Data Acquisition with Data Engineer team to extract clinical and imaging data from several
data sources like flat file and other databases.
 Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster.
 Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
 Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
 Setup and benchmarked Hadoop/Hbase clusters for internal use.
 Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services
(AzureData Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW,
HDInsight/Databricks, NoSQL DB)
 Developed Java Map Reduce programs for the analysis of sample log file stored in cluster.
 Developed Simple to complex Map/reduce Jobs using Hive and Pig
 Developed Map Reduce Programs for data analysis and data cleaning.
 Performed Data Preparation by using Pig Latin to get the right data format needed.
 Developed Spark scripts using Python on Azure HDInsight for Data Aggregation, Validation and verified its
performance over MR jobs
 Extracted and loaded data into Data Lake environment (MS Azure) by using Sqoop which was accessed by
business users.
 Primarily involved in Data Migration process using Azure by integrating with GitHub repository and
Jenkins.
 Extract Transform and Load data from Sources Systems to Azure Data Storage services using a
combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data
Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and
processing the data in in Azure Databricks.
 Utilized the clinical data to generate features to describe the different illnesses by using LDA Topic
Modelling.
 Utilized Waterfall methodology for team and project management.
 Used Git for version control with Data Engineer team and Data Scientists colleagues.
 Build machine learning models to showcase Big data capabilities using Pyspark and MLlib.
 Enhancing Data Ingestion Framework by creating more robust and secure data pipelines.
 Implemented data streaming capability using Kafka and Talend for multiple data sources.
 Worked with multiple storage formats (Avro, Parquet) and databases (Hive, Impala, Kudu).
 Processed the image data through the Hadoop distributed system by using Map and Reducethen stored
into HDFS.
 Used SCALA to store streamingdata to HDFS and to implement Spark for faster processing of data.
 Developed the Apache Storm, Kafka, and HDFS integration project to do a real time data analyses.
 Created Session Beans and controller Servlets for handling HTTP requests from Talend
 Performed Data Visualization and Designed Dashboards with Tableau and generated complex reports
including chars, summaries, and graphs to interpret the findings to the team and stakeholders.
 Wrote documentation for each report including purpose, data source, column mapping, transformation,
and user group.
 Recreating existing application logic and functionality in the Azure Data Lake, Data Factory, SQL Database
and SQL data warehouse environment.
 Used windows Azure SQL reporting services to create reports with tables, charts and maps
 Populated HDFS and PostgreSQL with huge amounts of data using Apache Kafka.
 Working knowledge of cluster security components like Kerberos, Sentry, SSL/TLS etc.
 Involved in the development of agile, iterative, and proven data modeling patterns that provide flexibility.
 Knowledge on implementing the JILs to automate the jobs in production cluster.
 Troubleshooted user's analyses bugs (JIRA and IRIS Ticket).
 Worked with SCRUM team in delivering agreed user stories on time for every Sprint. 
 Worked on analyzing and resolving the production job failures in several scenarios.
 Implemented UNIX scripts to define the use case workflow and to process the data files and automate the
jobs.

Environment:Ubuntu Hadoop, Spark (PySpark,Nifi,Jenkins, Kafka, Talend, SparkSQL,SparkMLIib), MS Azure.


Azure Data Bricks, Azure SQL. Azure Data Factory (ADF), Azure Data Lake, Pig, Python, 3.x(Nltk, Pandas),
Tableau,
GitHub, and OpenCV.

Client: Fingent, Cochin, India Mar 2014- Oct 2016


Role: Data Engineer/Hadoop Spark Developer
Responsibilities:
 Extensively worked with Spark-SQL context to create data frames and datasets to preprocess the model
data.
 Data Analysis: Expertise in analyzing data using Pig scripting, Hive Queries, Sparks (python) and Impala.
 Used Hive to implement data warehouse and stored data into HDFS. Stored data into Hadoop clusters
which are set up in AWS EMR.
 Involved in designing the row key in HBase to store Text and JSON as key values in HBase table and
designed row key in such a way to get/scan it in a sorted order.
 Wrote Junit tests and Integration test cases for those Microservice.
 Worked in Azure environment for development and deployment of Custom Hadoop Applications.
 Work heavily with Python, C++, Spark, SQL, Airflow, and Looker
 Experienced in loading and transforming large sets of Structured, Semi-Structured and Unstructured data
and analyzed them by running Hive queries and Pig scripts.
 Involved in various phases of development analyzed and developed the system going through Agile Scrum
methodology.
 Responsible for data extraction and data ingestion from different data sources into Hadoop Data Lake by
creating ETL pipelines using Pig, and Hive.
 Built pipelines to move hashed and un-hashed data from XML files to Data lake.
 Developed NiFi workflow to pick up the multiple files from ftp location and move those to HDFS on daily
basis.
 Scripting: Expertise in Hive, PIG, Impala, Shell Scripting, Perl Scripting, and Python.
 Worked with developer teams on NiFi workflow to pick up the data from rest API server, from data lake as
well as from SFTP server and send that to Kafka.
 Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business
transformations.
 Proven experience with ETL frameworks (Airflow, Luigi, or our own open sourced garcon)
 Created Hive schemas using performance techniques like partitioning and bucketing.
 Used Hadoop YARN to perform analytics on data in Hive.
 Developed and maintained batch data flow using HiveQL and Unix scripting
 Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
 Build large-scale data processing systems in data warehousing solutions, and work with unstructured data
mining on NoSQL.
 S3 – Data Lake Management. Responsible for maintaining and handling data inbound and outbound
requests through big data platform.
 Specified the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification
texts in JSON File format.
 Developed workflow in Oozie to manage and schedule jobs on Hadoop cluster to trigger daily, weekly and
monthly batch cycles.
 Configured Hadoop tools like Hive, Pig, Zookeeper, Flume, Impala and Sqoop.
 Wrote Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
 Queried both Managed and External tables created by Hive using Impala.
 Developed customized Hive UDFs and UDAFs in Java, JDBC connectivity with hive development and
execution of Pig scripts and Pig UDF’s.

Environment: Hadoop, Microservices, Java, MapReduce, Agile, HBase, JSON, Spark, Kafka, JDBC,AWS
EMR/EC2/S3,Hive, JSON, Pig, Flume, Zookeeper, Impala, Sqoop
Client: Axis Bank, Mumbai, India Sep 2012- Nov 2013
Role: Data Engineer
Responsibilities:

 Research and recommend suitable technology stack for Hadoop migration considering current enterprise
architecture.
 Responsible for building scalable distributed data solutions using Hadoop.
 Experienced in loading and transforming of large sets of structured, semi-structured and unstructured
data.
 Experienced in developing Spark scripts for data analysis in both python and Scala.
 Wrote Scala scripts to make spark streaming work with Kafka as part of spark Kafka integration efforts.
 Built on premise data pipelines using Kafka and spark for real-time data analysis.
 Created reports in TABLEAU for visualization of the data sets created and tested Spark SQL connectors.
 Implemented Hive complex UDF's to execute business logic with Hive Queries.
 Developed Spark jobs and Hive Jobs to summarize and transform data.
 Developed a different kind of custom filters and handled pre-defined filters on HBase data using API.
 Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
 Handled importing data from different data sources into HDFS using Sqoop and performing
transformations using Hive and then loading data into HDFS.
 Exporting of a result set from HIVE to MySQL using Sqoop export tool for further processing.
 Collecting and aggregating large amounts of log data and staging data in HDFS for further analysis.
 Experience in managing and reviewing Hadoop Log files.
 Used Sqoop to transfer data between relational databases and Hadoop.
 Worked on HDFS to store and access huge datasets within Hadoop.
 Good hands on experience with GitHub.

Environment: Cloudera Manager, HDFS, Sqoop, Pig, Hive, Oozie, Spark SQL, Tableau, My SQL, Python, Kafka,
flume, Java, Scala, Git.

You might also like