Professional Documents
Culture Documents
Big Data - Cloud - AI
Big Data - Cloud - AI
BIG DATA
• Big data is a field that treats ways to analyze, systematically
extract information from data sets that are too large or
complex
• Big data analysis challenges include capturing data, data
storage, data analysis, search, sharing, transfer, visualization,
querying, updating, information privacy, and data source.
• In simple words, big data often includes data with sizes that
exceed the capacity of traditional software to process within
an acceptable time and value.
• Big data was originally associated with three key concepts:
volume, variety and velocity.
●2
VOLUME of data
• Big data represents that volume of data which cannot be
processed by traditional data processing softwares.
●3
VELOCITY of data
• If data is being produced fast and if time to be consumed to
analyze the data accurately is too short, then Big Data
analytics can be used.
• While some forms of data can be batched processed and
remain relevant over time, much of big data is streaming into
organizations at a clip and requires immediate action for the
best outcomes.
• Sensor data from health devices is a great example. The
ability to instantly process health data can provide users and
physicians with potentially life-saving information.
●4
VARIETY of data
• Big Data Variety refers to the different types of data collected
and processed in a big data environment.
• It includes structured, semi-structured, and unstructured data.
• Everything from emails, videos to scientific and
meteorological data can constitute a big data stream, each
with their own unique attributes.
• Big data analysis provides a way to data engineers to
integrate the vast amounts of complex information created by
sensors, networks, transactions, smart devices, web usage,
and more.
●5
BIG DATA ANALYTICS
• Big data analytics refers to collecting, processing, cleaning,
and analyzing large datasets to help organizations
operationalize their big data.
• COLLECTION: With todays technology, organizations can
gather both structured and unstructured data from a variety
of sources like from cloud storage, mobile applications, video
servers etc.
• This data can be stored in data warehouses where business
intelligence tools and digital solutions can access it easily.
• Raw or unstructured data that is too diverse or complex for a
warehouse may be assigned metadata and stored in a data
lake.
●6
BIG DATA ANALYTICS
●7
BIG DATA ANALYTICS
• CLEANING:
• Data big or small requires scrubbing to improve data quality
and get stronger results.
• Process of fixing incorrect, incomplete, duplicate or
otherwise erroneous data in a data set.
• It involves identifying data errors and then changing,
updating or removing data to correct them.
• All data must be formatted correctly, and any duplicative or
irrelevant data must be eliminated or accounted for.
●8
BIG DATA ANALYTICS
• ANALYZING:
• Data mining sorts through large datasets to identify patterns
and relationships by identifying anomalies and creating data
clusters.
●9
BIG DATA ANALYTICS Tools
• Hadoop is an open-source framework that efficiently stores
and processes big datasets on clusters of commodity
hardware. This framework is free and can handle large
amounts of structured and unstructured data, .
• NoSQL databases are non-relational data management
systems that do not require a fixed scheme, making them a
great option for big, raw, unstructured data.
• MapReduce is an essential component to the Hadoop
framework serving two functions. The first is mapping, which
filters data to various nodes within the cluster. The second is
reducing, which organizes and reduces the results from each
node to answer a query.
●10
BIG DATA ANALYTICS Tools
• YARN stands for ―Yet Another Resource Negotiator. It is
another component of second-generation Hadoop. The
cluster management technology helps with job scheduling
and resource management in the cluster.
• Spark is an open source cluster computing framework that
can handle both batch and stream processing for fast
computation
• Tableau is an end-to-end data analytics platform that allows
you to prep, analyze, collaborate, and share your big data
insights. Tableau excels in self-service visual analysis,
allowing people to ask new questions of governed big data
and easily share those insights across the organization.
●11
BENEFITS OF BIG DATA ANALYTICS
●12
CHALLENGES OF BIG DATA
• Making big data accessible. Collecting and processing data
becomes more difficult as the amount of data grows.
• Maintaining quality data. With so much data to maintain,
organizations are spending more time than ever before
scrubbing for duplicates, errors, absences, conflicts, and
inconsistencies.
• Keeping data secure. As the amount of data grows, so do
privacy and security concerns.
• Finding the right tools and platforms. New technologies for
processing and analyzing big data are developed all the time.
Organizations must find the right technology to work within
their established ecosystems and address their particular
needs.
●13
BIG DATA IN INDUSTRIES
●14
BIG DATA IN BANKING SECTOR
●15
REAL TIME BIG DATA TOOLS
●16
CLOUD COMPUTING
• Cloud is a bunch of computers networked together in same
or different geographical locations, operating together to
serve a number of customers with different need and
workload on demand basis with the help of virtualization.
• Cloud users don't have to worry about installing, maintaining
hardware and software needed.
• Cloud is essentially provided by large distributed data
centers. These data centers are often organized as grid and
the cloud is built on top of the grid services.
• Cloud users are provided with virtual images of the physical
machines in the data centers.
●17
TYPE OF CLOUD
• Private Cloud :- This type of cloud is maintained within an
organization and used solely for their internal purpose.
Security, network bandwidth are not critical issues for private
cloud.
●18
TYPE OF CLOUD
●19
CLOUD STAKEHOLDERS
• There are three types of stakeholders
• Cloud providers
• Cloud users
• End users
●20
ADVANTAGES OF CLOUD
• Cloud Providers' point of view
• Most of the data centers today are under utilized. They
are mostly 15% utilized.
• Large companies having those data centers can easily
rent those computing power to other organizations and
get profit out of it
• Cloud Users' point of view
• Cloud users need not to take care about the hardware
and software they use and also they don't have to be
worried about maintenance.
• Cloud users can use the resources on demand basis
and pay as much as they use.
●21
CLOUD ARCHITECTURE
●23
CLOUD COMPUTING & GRID
COMPUTING
●24
CLOUD COMPUTING & GRID
COMPUTING
●25
TYPES OF UTILITY CLOUD SERVICES
●26
SOFTWARE AS A SERVICE
• Under SaaS, the software publisher (seller) runs and maintains all
necessary hardware and software.
• The customer of SaaS accesses the applications through Internet.
• For example Salesforce.com with yearly revenues of over $300M,
offerrs on- demand Customer Relationship Management software
solutions. This application runs on Salesforce.com's own
infrastructure and delivered directly to the users over the Internet.
Sales force does not sell perpetual licenses but it charges a
monthly subscription fee starting at $65/user/month
• Google docs is also a example of SaaS where the users can
create, edit, delete and share their documents, spread-sheets or
presentations
●27
PLATEFORM AS A SERVICE
●28
INFRASTRUCTURE AS A SERVICE
●29
Cloud Computing Architecture
• Cloud Computing Architecture is a combination of
components required for a Cloud Computing service like a
front-end platform, a back-end platform or servers, a network
or internet service, and a cloud based delivery service.
• Front End - Front end consist client part of cloud computing
system. It comprise of interfaces and applications that are
required to access the Cloud Computing or Cloud
Programming platform. Example - Web Browser.
• Back End - Back end refers to the cloud itself, it comprises of
the resources that are required for cloud computing services. It
consists of virtual machines, servers, data storage, security
mechanism etc. It is under providers‗ control.
●32
Virtualization
●33
Network Virtualization
●34
Storage Virtualization
●35
Storage Virtualization
●36
Server Virtualization
●37
Grid Computing
●38
Challenges in Cloud Computing
●39
Artificial Intelligence
●40
AI Levels
●41
Fields of AI
• Gaming
• Natural Language Processing (understands human
language)
• Expert Systems (Provides reasoning and advising)
• Vision systems (These systems understand, interpret, and
comprehend visual input on the computer. E.g. Recognising
face of criminal)
• Speech Recognition
• Handwriting Recognition
• Intelligent Robots
●42
Applications of AI
• Gaming
• Astronomy
• Healthcare
• Data Security
• Social Media
• Travel & Transport
• Automotive Industry
• Robotics
• Entertainment
• Agriculture
• E-Commerce
●43
Subsets of AI
●44
Machine Learning
●45
Types of Machine Learning
●47
© BHARAT SANCHAR NIGAM LIMITED