Professional Documents
Culture Documents
Bigdata & Hadoop: Shushrutha Reddy K M.Tech in Computational Engineering From Rgukt Senior Bigdata Developer @servicenow
Bigdata & Hadoop: Shushrutha Reddy K M.Tech in Computational Engineering From Rgukt Senior Bigdata Developer @servicenow
Hadoop
Shushrutha Reddy K
M.Tech in Computational Engineering from RGUKT
Senior BigData Developer @ServiceNow
Bigdata
Hadoop
MapReduce
Agenda YARN
Spark
Amazon EMR
Friday, 21 January 2022 2
How It All Started?
BigData is a term used for a collection of data sets that are large and
complex, which is difficult to store and process using available database
management tools or traditional data processing applications.
Cost Reduction
Descriptive Analytics
•data aggregation and data mining to provide insight into the past
Diagnostic Analytics
Predictive Analytics
Prescriptive Analytics
3. Horizontal Scalability
• Records each and every change that takes place to the file system metadata
• If a file is deleted in HDFS, the NameNode will immediately record this in the EditLog
• Regularly receives a Heartbeat and a block report from all the DataNodes in the cluster to
ensure that the DataNodes are alive
• Keeps a record of all the blocks in the HDFS and DataNode in which they are stored
• Responsible for serving read and write requests from the clients
Dear, Bear, River, Car, Car, River, Deer, Car and Bear
Cluster-level (one for each cluster) component and runs on the master machine
Two Components:
2) Get Application ID
5) Allocate Resources
6 a) Container
b) Launch
7) Execute
• 100x faster than Hadoop for large scale data processing by exploiting in-memory
computations and other optimizations
• Elastic - Auto Scaling can use to modify the number of instances automatically
• Economical – Cheap and has support for Amazon EC2 Spot and Reserved Instances
• Secure - Inbuilt capability to turn on the firewall for the protection and controlling cloud
network access to instances
• Flexible - For performing tasks such as root access to any instance, Installation of additional
applications, and customization of the cluster with bootstrap actions