Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 45

BIG DATA

BIG DATA
• Big data is a field that treats ways to analyze, systematically
extract information from data sets that are too large or
complex
• Big data analysis challenges include capturing data, data
storage, data analysis, search, sharing, transfer, visualization,
querying, updating, information privacy, and data source.
• In simple words, big data often includes data with sizes that
exceed the capacity of traditional software to process within
an acceptable time and value.
• Big data was originally associated with three key concepts:
volume, variety and velocity.

●2
VOLUME of data
• Big data represents that volume of data which cannot be
processed by traditional data processing softwares.

• While traditional data is measured in familiar sizes like


megabytes, gigabytes and terabytes, big data is stored in
petabytes and zettabytes.

• 1 Petabytes=1015 Bytes and 1 Zettabytes= 1021 Bytes

• Big Data provides of handling such big volume of data

●3
VELOCITY of data
• If data is being produced fast and if time to be consumed to
analyze the data accurately is too short, then Big Data
analytics can be used.
• While some forms of data can be batched processed and
remain relevant over time, much of big data is streaming into
organizations at a clip and requires immediate action for the
best outcomes.
• Sensor data from health devices is a great example. The
ability to instantly process health data can provide users and
physicians with potentially life-saving information.

●4
VARIETY of data
• Big Data Variety refers to the different types of data collected
and processed in a big data environment.
• It includes structured, semi-structured, and unstructured data.
• Everything from emails, videos to scientific and
meteorological data can constitute a big data stream, each
with their own unique attributes.
• Big data analysis provides a way to data engineers to
integrate the vast amounts of complex information created by
sensors, networks, transactions, smart devices, web usage,
and more.

●5
BIG DATA ANALYTICS
• Big data analytics refers to collecting, processing, cleaning,
and analyzing large datasets to help organizations
operationalize their big data.
• COLLECTION: With todays technology, organizations can
gather both structured and unstructured data from a variety
of sources like from cloud storage, mobile applications, video
servers etc.
• This data can be stored in data warehouses where business
intelligence tools and digital solutions can access it easily.
• Raw or unstructured data that is too diverse or complex for a
warehouse may be assigned metadata and stored in a data
lake.

●6
BIG DATA ANALYTICS

• PROCESSING: Mainly two ways


• Batch processing: Batch processing refers to
processing of high volume of data in batch within a
specific time span. Batch processing is useful when there
is a longer turnaround time between collecting and
analyzing data.
• Stream processing looks at small batches of data at
once, shortening the delay time between collection and
analysis for quicker decision-making. Stream processing
is more complex and often more expensive.

●7
BIG DATA ANALYTICS

• CLEANING:
• Data big or small requires scrubbing to improve data quality
and get stronger results.
• Process of fixing incorrect, incomplete, duplicate or
otherwise erroneous data in a data set.
• It involves identifying data errors and then changing,
updating or removing data to correct them.
• All data must be formatted correctly, and any duplicative or
irrelevant data must be eliminated or accounted for.

●8
BIG DATA ANALYTICS
• ANALYZING:
• Data mining sorts through large datasets to identify patterns
and relationships by identifying anomalies and creating data
clusters.

• Predictive analytics uses an organizations historical data to


make predictions about the future, identifying upcoming risks
and opportunities.

• Deep learning imitates human learning patterns by using


artificial intelligence and machine learning to layer algorithms
and find patterns in the most complex and abstract data.

●9
BIG DATA ANALYTICS Tools
• Hadoop is an open-source framework that efficiently stores
and processes big datasets on clusters of commodity
hardware. This framework is free and can handle large
amounts of structured and unstructured data, .
• NoSQL databases are non-relational data management
systems that do not require a fixed scheme, making them a
great option for big, raw, unstructured data.
• MapReduce is an essential component to the Hadoop
framework serving two functions. The first is mapping, which
filters data to various nodes within the cluster. The second is
reducing, which organizes and reduces the results from each
node to answer a query.

●10
BIG DATA ANALYTICS Tools
• YARN stands for ―Yet Another Resource Negotiator. It is
another component of second-generation Hadoop. The
cluster management technology helps with job scheduling
and resource management in the cluster.
• Spark is an open source cluster computing framework that
can handle both batch and stream processing for fast
computation
• Tableau is an end-to-end data analytics platform that allows
you to prep, analyze, collaborate, and share your big data
insights. Tableau excels in self-service visual analysis,
allowing people to ask new questions of governed big data
and easily share those insights across the organization.

●11
BENEFITS OF BIG DATA ANALYTICS

• Cost savings. Helping organizations identify ways to do


business more efficiently.

• Product development. Providing a better understanding


of customer needs

• Market insights. Tracking purchase behavior and market


trends

●12
CHALLENGES OF BIG DATA
• Making big data accessible. Collecting and processing data
becomes more difficult as the amount of data grows.
• Maintaining quality data. With so much data to maintain,
organizations are spending more time than ever before
scrubbing for duplicates, errors, absences, conflicts, and
inconsistencies.
• Keeping data secure. As the amount of data grows, so do
privacy and security concerns.
• Finding the right tools and platforms. New technologies for
processing and analyzing big data are developed all the time.
Organizations must find the right technology to work within
their established ecosystems and address their particular
needs.
●13
BIG DATA IN INDUSTRIES

• Making big data accessible. Collecting and processing


data becomes more difficult as the amount of data grows.
• Education industry
• Insurance industry
• Government Sectors
• Banking Sectors

●14
BIG DATA IN BANKING SECTOR

• Study and analysis of big data can help detect


• The misuse of credit cards
• Misuse of debit cards
• Business clarity
• Customer statistics alteration
• Money laundering
• Risk Mitigation

●15
REAL TIME BIG DATA TOOLS

• Real time big data analytics tools


• Storm
• Cloudera
• Gridgrain
• SpaceCurve

●16
CLOUD COMPUTING
• Cloud is a bunch of computers networked together in same
or different geographical locations, operating together to
serve a number of customers with different need and
workload on demand basis with the help of virtualization.
• Cloud users don't have to worry about installing, maintaining
hardware and software needed.
• Cloud is essentially provided by large distributed data
centers. These data centers are often organized as grid and
the cloud is built on top of the grid services.
• Cloud users are provided with virtual images of the physical
machines in the data centers.

●17
TYPE OF CLOUD
• Private Cloud :- This type of cloud is maintained within an
organization and used solely for their internal purpose.
Security, network bandwidth are not critical issues for private
cloud.

• Public Cloud In this type an organization rents cloud


services from cloud providers on-demand basis.

●18
TYPE OF CLOUD

• Hybrid Cloud This type of cloud is composed of multiple


internal or external cloud. This is the scenario when an
organization moves to public cloud computing domain from
its internal private cloud.

• Community Cloud A community cloud in computing is a


collaborative effort in which infrastructure is shared between
several organizations from a specific community with
common concerns, whether managed internally or by a third
party and hosted internally or externally.

●19
CLOUD STAKEHOLDERS
• There are three types of stakeholders
• Cloud providers
• Cloud users
• End users

• Cloud providers provide cloud services to the cloud users.

• The cloud users develop their product using these services


and deliver the product to the end users.

●20
ADVANTAGES OF CLOUD
• Cloud Providers' point of view
• Most of the data centers today are under utilized. They
are mostly 15% utilized.
• Large companies having those data centers can easily
rent those computing power to other organizations and
get profit out of it
• Cloud Users' point of view
• Cloud users need not to take care about the hardware
and software they use and also they don't have to be
worried about maintenance.
• Cloud users can use the resources on demand basis
and pay as much as they use.

●21
CLOUD ARCHITECTURE

●23
CLOUD COMPUTING & GRID
COMPUTING

●24
CLOUD COMPUTING & GRID
COMPUTING

●25
TYPES OF UTILITY CLOUD SERVICES

• Utility computing services provided by the cloud provider can


be classified by the type of the services.
• These services are typically represented as XaaS where we
can replace X by Infrastructure or Platform or Hardware or
Software or Desktop or Data etc
• There are three main types of services most widely accepted
• Software as a Service
• Platform as a Service
• Infrastructure as a Service
• These services provide different levels of abstraction and
flexibility to the cloud users.

●26
SOFTWARE AS A SERVICE
• Under SaaS, the software publisher (seller) runs and maintains all
necessary hardware and software.
• The customer of SaaS accesses the applications through Internet.
• For example Salesforce.com with yearly revenues of over $300M,
offerrs on- demand Customer Relationship Management software
solutions. This application runs on Salesforce.com's own
infrastructure and delivered directly to the users over the Internet.
Sales force does not sell perpetual licenses but it charges a
monthly subscription fee starting at $65/user/month
• Google docs is also a example of SaaS where the users can
create, edit, delete and share their documents, spread-sheets or
presentations

●27
PLATEFORM AS A SERVICE

• Delivers development environment as a service


• One can build his/her own applications that run on the
provider's infras-tructure that support transactions,
uniform authentication, robust scalability and
availability.
• The applications built using PaaS are offered as SaaS
and consumed directly from the end users' web
browsers.
• Ex:- Windows Azure

●28
INFRASTRUCTURE AS A SERVICE

• Company providing IaaS gives OS level control, CPU clocks,


bandwidth, load balancers to the developers .
• The pool of hardware resource is extracted from multiple
servers and networks usually distributed across numerous data
centers. This provides redundancy and reliability to IaaS.
• For small scale businesses who are looking for cutting cost on
IT infrastructure, IaaS is one of the solutions.
• Annually a lot of money is spent in maintenance and buying
new components like hard-drives, network connections,
external storage device etc. which a business owner could have
saved for other expenses by using IaaS.

●29
Cloud Computing Architecture
• Cloud Computing Architecture is a combination of
components required for a Cloud Computing service like a
front-end platform, a back-end platform or servers, a network
or internet service, and a cloud based delivery service.
• Front End - Front end consist client part of cloud computing
system. It comprise of interfaces and applications that are
required to access the Cloud Computing or Cloud
Programming platform. Example - Web Browser.
• Back End - Back end refers to the cloud itself, it comprises of
the resources that are required for cloud computing services. It
consists of virtual machines, servers, data storage, security
mechanism etc. It is under providers‗ control.

●32
Virtualization

• The main enabling technology for Cloud Computing is


Virtualization
• Virtualization is a partitioning of single physical server into
multiple logical servers.
• Once the physical server is divided, each logical server
behaves like a physical server and can run an operating
system and applications independently.
• Virtualization is mainly used for three main purposes - Network
Virtualization, Server Virtualization and Storage Virtualization

●33
Network Virtualization

• It is a method of combining the available resources in a


network by splitting up the available bandwidth into channels,
each of which is independent from the others and each
channel is independent of others and can be assigned to a
specific server or device in real time.

●34
Storage Virtualization

• It is the pooling of physical storage from multiple network


storage devices into what appears to be a single storage
device that is managed from a central console. Storage
virtualization is commonly used in storage area networks

●35
Storage Virtualization

• It is the pooling of physical storage from multiple network


storage devices into what appears to be a single storage
device that is managed from a central console. Storage
virtualization is commonly used in storage area networks

●36
Server Virtualization

• The intention of server virtualization is to increase the


resource sharing like processors, RAM, Operating System etc.
and reduce the burden and complexity of computation from
users.
• Similarly ,Cloud computing can have Hardware virtualization,
Server virtualization, OS virtualization, Data Virtualization.

●37
Grid Computing

• Grid computing is also known as distributed computing


• It is a processor architecture that combines various different
computing resources from multiple locations to achieve a
common goal.
• Grid computing contains the following three types of machines
• Control Node: It is a group of server which administrates
the whole network.
• Provider: It is a computer which contributes its resources
in the network resource pool.
• User: It is a computer which uses the resources on the
network.

●38
Challenges in Cloud Computing

• Security and Privacy (of information)


• Portability (of applications among cloud providers)
• Interoperability (among different plateforms)
• Computing Performance (requirement of high network
bandwidth)
• Reliability and Availability

●39
Artificial Intelligence

• AI (Artificial Intelligence) is the ability of a machine to perform


cognitive functions as humans do, such as perceiving,
learning, reasoning and solving problems.

• The benchmark for AI is the human level concerning in teams


of reasoning, speech, and vision.

●40
AI Levels

• Narrow AI: A artificial intelligence is said to be narrow when


the machine can perform a specific task better than a human.

• General AI: An artificial intelligence reaches the general state


when it can perform any intellectual task with the same
accuracy level as a human would.

• Strong AI: An AI is strong when it can beat humans in many


tasks

●41
Fields of AI

• Gaming
• Natural Language Processing (understands human
language)
• Expert Systems (Provides reasoning and advising)
• Vision systems (These systems understand, interpret, and
comprehend visual input on the computer. E.g. Recognising
face of criminal)
• Speech Recognition
• Handwriting Recognition
• Intelligent Robots

●42
Applications of AI
• Gaming
• Astronomy
• Healthcare
• Data Security
• Social Media
• Travel & Transport
• Automotive Industry
• Robotics
• Entertainment
• Agriculture
• E-Commerce

●43
Subsets of AI

• Machine Learning (Deep Learning)


• Natural Language Processing
• Expert systems
• Robotics
• Machine Vision
• Speech Recognition

●44
Machine Learning

• Machine learning is a part of AI which provides intelligence to


machines with the ability to automatically learn with
experiences without being explicitly programmed.
• It is primarily concerned with the design and development
of algorithms that allow the system to learn from historical
data.
• Machine Learning is based on the idea that machines can
learn from past data, identify patterns, and make decisions
using algorithms.
• Machine learning algorithms are designed in such a way
that they can learn and improve their performance
automatically

●45
Types of Machine Learning

• Supervised learning: Supervised learning is a type of


machine learning in which machine learn from known datasets
(set of training examples), and then predict the output.
• Reinforcement learning: is a type of learning in which an AI
agent is trained by giving some commands, and on each
action, an agent gets a reward as a feedback. Using these
feedbacks, agent improves its performance.
• Unsupervised learning: Unsupervised learning is associated
with learning without supervision or training. In unsupervised
learning, the algorithms are trained with data which is neither
labeled nor classified. In unsupervised learning, the agent
needs to learn from patterns without corresponding output
values.
●46
List of AI tools
• Scikit Learn
• TensorFlow
• Theano
• Caffe
• MxNet
• Keras
• PyTorch
• CNTK
• Auto
• ML
• OpenNN
• H20: Open Source AI Platform
• Google ML Kit

●47
© BHARAT SANCHAR NIGAM LIMITED

You might also like