Big Data ecosystems-TayyabaArooj

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Hamdard Institute of Engineering & Technology

Department of Computing
Program: BSCS

COURSE TITLE CLOUD COMPUTING


SEMESTER/YEAR 7TH SEMESTER-2023

COURSE INSTRUCTOR SIR KASHIF AALAM


ASSIGNMENT TITLE BIG DATA ECOSYSTEM
ASSIGNMENT NO 05

NAME TAYYABA AROOJ


CMS ID 1603-2020
SIGNATURE TAYYABA
*By signing above, you attest that you have contributed to this submission and confirm that all
work you have contributed to this submission is your own work. Any suspicion of copying or
plagiarism in this work will result in an investigation of Academic Misconduct and may result in a
“0” on the work, an “F” in the course, or possibly more severe penalties.
BIG DATA ECOSYSTEM

BIG DATA ECOSYSTEM:


The big data ecosystem refers to the collection of tools, frameworks, and technologies that work
together to process, store, and analyze massive amounts of data. Common tools that are used to
manipulate Big Data are Hadoop (Apache), MapReduce, and Bigtable

COMPONENTS OF BIG DATA ECOSYSTEMS:


Three components of big data ecosystems are:
1. Data sources
2. Data management (integration, storage and processing)
3. Data analytics, Business intelligence (BI) and knowledge discovery (KD)
1. Data Sources:
Data sources are the origin points of information that contribute to a big data ecosystem. These
sources can be diverse and include structured, semi-structured, and unstructured data. Examples
of data sources include:
Structured Data Sources: Databases, spreadsheets, relational databases.
Semi-Structured Data Sources: XML files, JSON files, log files.
Unstructured Data Sources: Text documents, social media feeds, multimedia content.
The challenge lies in efficiently capturing, ingesting, and integrating data from these various
sources to create a comprehensive and cohesive dataset for analysis and decision-making.
2. Data Management (Integration, Storage, and Processing):
Data management encompasses the processes and technologies involved in handling data
throughout its lifecycle. This component includes three critical aspects:
• Integration: Involves combining data from different sources to provide a unified view. Data
integration tools help in cleansing, transforming, and loading (ETL) data into a common
format for analysis.
• Storage: Addresses where and how data is stored. This includes traditional databases, data
warehouses, and modern storage solutions like cloud-based data lakes. The choice of
storage solution often depends on factors such as data volume, variety, and access patterns.
• Processing: Involves the computation and analysis of data. Big data processing
frameworks, like Apache Spark and Apache Flink, enable distributed processing of large
datasets. These frameworks are crucial for handling the volume and velocity of big data.
Effective data management ensures that data is organized, accessible, and processed efficiently,
laying the foundation for meaningful analytics and insights.
3. Data Analytics, Business Intelligence (BI), and Knowledge Discovery (KD):
This component focuses on extracting valuable insights and knowledge from the data. Each
element plays a specific role:
• Data Analytics: Involves the use of statistical analysis and algorithms to discover patterns,
correlations, and trends within the data. It includes descriptive analytics (what happened),
diagnostic analytics (why it happened), predictive analytics (what might happen), and
prescriptive analytics (what action to take).
• Business Intelligence (BI): Encompasses tools, processes, and technologies that transform
raw data into actionable insights for decision-making. BI tools provide dashboards, reports,
and visualizations to facilitate data-driven decision-making across an organization.
• Knowledge Discovery (KD): Involves the process of uncovering hidden patterns, trends,
and knowledge from large datasets. Machine learning algorithms, data mining techniques,
and advanced analytics contribute to knowledge discovery, allowing organizations to make
informed strategic decisions.

10 V’s OF BIG DATA:


Volume:
Refers to the sheer size of the data generated and collected.
Velocity:
Describes the speed at which data is generated, processed, and analyzed, especially in real-time
scenarios.
Variety:
Encompasses the different types of data, including structured, semi-structured, and unstructured
data.
Veracity:
Deals with the quality and reliability of the data.
Value:
Focuses on the importance of extracting meaningful insights and value from the data.
Variability:
Describes the inconsistency of data flow, both in terms of arrival times and types.
Visualization:
Involves presenting data in a way that is easily understandable and aids in decision-making.
Volatility:
Represents how long data is relevant and how long it should be stored.
Vulnerability:
Relates to the security and protection of data.
Validity:
Addresses the accuracy and trustworthiness of data.

You might also like