Mittal School of Business: Course Code: CAP348 Course Title: Introduction To Big Data

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

MITTAL SCHOOL OF BUSINESS

Course Code: CAP348 Course Title: Introduction to Big Data

Academic Task No: 1 Academic Task Title: REPORT BASE

Date of Allotment: 25/01/2021 Date of Submission: 16/02/2021

Name and Roll No: Nitin Patidar, B35 Section: Q1912


Ques.1 Define Big Data and Explain the Five Vs. of Big Data?
Ans.1 Big data is a term that defines the huge volume of data – both structured and
unstructured – that deluges a business on a day-to-day basis. Big data is a comparatively modern
field of data science that explores how large data groups can be broken down and analyzed in
order to systematically assemble insights and information from them. Earlier, conventional data
processing solutions are not well-organized with respect to capturing, storing and analyzing big
data. There is 5 Vs of Big data.
1. Velocity
2. Volume
3. Value
4. Variety
5. Validity
Velocity
Velocity refers to the speed at which the data is generated, composed and examined. The data are
continuously flows through multiple channels such as computer systems, networks, social media,
mobile phones etc. In today’s the data-driven the business environment, the step at which data
grows can be the best defined as ‘torrential’ and ‘unprecedented’. Now, this data should also be
captured as close to real-time as possible, making the right data at which available at the right
time. The speed at which data can be retrieved has a direct impact on making timely and precise
business decisions. Even a limited amount of data that is available in real-time.
Volume
Big data volume defines the ‘amount’ of data that is produced. The value of data is also
dependent on the size of the data. Today data is generated from various sources in different
formats – structured and unstructured. Some of these data formats include word and excel
documents, PDFs and reports along with media content such as images and videos. Due to the
data explosion caused to digital and social media, data is rapidly being produced in such large
chunks, it has become challenging for enterprises to store and process it using conventional
methods of business intelligence and analytics.

Value
The data is existence produced in large volumes, just collecting it is of no use. Instead, data from
which business insights are gathered add ‘value’ to the company. In the context of big data, value
amounts to how well-meaning the data is of positively impacting on a company’s or businesses.
The big data analytics come into the picture. While many companies have invested in
establishing data combination and storage infrastructure in their organizations, they fail to
understand that the combination of data doesn’t equivalent to value addition. With the help of
advanced data analytics, useful insights can be resulting from the collected data.
Variety
The volume and velocity of data are important features that add value to a business, big data also
involves processing various data types collected from diverse data sources. The sources of data
may involve external sources as well as internal source. Generally, the big data is classified as
structured data and unstructured data. The structured data is one whose format, length and
volume are very clearly defined. The unstructured data is unorganized data and doesn’t follow
with the traditional data formats. The data is produced via digital and social media (images,
videos, tweets, etc.) can be classified as unstructured data.
Validity
The Validity of big data, is the declaration of quality or credibility of the collected data. Validity
means the correct and truthful data for the planned use. The validity of big data sources and
succeeding analysis must be accurate, if you are to use the results for decision making.

Ques.2 How is Hadoop related to Big Data? Describe its components


Ans. 2 Big data is merely the large sets of data for the businesses and for the other parties put
together to serve specific goals and operations. Big data can include many different kinds of data
and in numerous different kinds of formats.
Hadoop is one of the tools that considered to handle the big data. Hadoop and the other software
products work to interpret or analyze the results of big data searches through specific registered
algorithms and approaches. Hadoop is an open-source database under the Apache license that is
maintained by a worldwide community of users. It contains various main components, including
a MapReduce set of functions and a Hadoop distributed file system (HDFS).

Components of Hadoop

1.Hadoop HDFS

Hadoop Distributed File System (HDFS) is the storage unit of Hadoop. HDFS is specially
designed for storing massive datasets in commodity hardware. Hadoop enables you to use
commodity machines as your data nodes. This way, you don’t have to spend millions of dollars
just on your data nodes

2. Hadoop MapReduce

Hadoop MapReduce is the processing unit of Hadoop. MapReduce is the processing unit of
Hadoop. In the MapReduce method, the processing is done at the slave nodes, and the final result
is directed to the master node.

3. Hadoop YARN

Hadoop YARN stands for Yet Another Resource Negotiator. It is the source management unit of
Hadoop and is available as a component of Hadoop version. Hadoop YARN acts like an OS to
Hadoop. It is a file classification that is built on top of HDFS. It is responsible for managing
cluster resources to make sure you don't overload one machine.
Ques.3 Where does Big Data come from

Ans3. Big data comes from many different sources, such as business transaction systems,
customer databases, medical records, internet clickstream logs, mobile applications, social
networks, scientific research sources, machine-generated data and real-time data sensors used in
internet of things. Some of the other media where we collect data.

Social data arises from the Likes, Tweets & Retweets, Comments, Video Uploads, and general
media that are uploaded and shared via the world’s favorite social media platforms. The public
web is additional respectable source of social data, and tools like Google Trends can be used to
good effect to increase the volume of big data.
Machine data is well-defined as information which is generated by industrial equipment,
sensors that are installed in machinery, and even web logs which track user behavior. Sensors are
such as medical devices, smart rhythms, road cameras, satellites, games and the rapidly growing
Internet things will deliver high velocity, value, volume and variety of data in the very near
future.
Transactional data is generated from all the daily contacts that take place both online and
offline. Invoices, payment orders, storage records, delivery receipts and all are characterized as
transactional data.

Ques.4 Define data mining and explain different 1categories of data mining with examples
Ans.4 Data mining is a process which is used by companies to turn raw data into the useful
information. By using software to look for patterns in large batches of data, the businesses can
learn more about their clients to develop more effective marketing strategies, it will help to
increase sales and to decrease the costs. Data mining depends on effective data
collection, warehousing, and the computer processing.
1. Data stored in the database
A database is similarly called a database management system or DBMS. Every DBMS stores
data that are connected to each other in a mode or the other. It also has a set of software
programs that are used to accomplish data and provide easy contact to it. These software
programs serve a lot of resolutions, including defining structure for database, making sure that
the stored information remains secured and consistent, and managing different types of data
access, such as shared, distributed, and concurrent.
2. Data warehouse
A data warehouse is a single data storage location that gathers data from multiple sources and
then stores it in the form of a combined plan. When data is deposited in a data warehouse, it
undergoes cleaning, integration, loading, and refreshing. Data stored in a data warehouse is
prepared in several parts. If you need information on data that was stored 6 or 12 months back,
you will get it in the form of a summary.

3. Transactional data
Transactional database stores greatest that are captured as transactions. These transactions
comprise flight booking, customer purchase, click on a website, and others. Every transaction
highest has a unique ID. It also lists all those substances that made it a transaction.
Ques. 5 How is big data analysis helpful in increasing business revenue
Ans.5 The past few years, there has been a slow shift of businesses near Big Data. This shift
from the outmoded way to using the big Data is because of its helpfulness in increasing business
revenue. There are multiple ways in which Big Data helps in increasing the business revenue.
Ways in which Big Data Helps in Increasing Business Revenue:
▪ Arrangement of Data
The first way is the preparation of data while seeing your business goals. Big Data helps you in
placing the data with respects to your business goals and needs. It offers you with a strong
picture of your data, which agrees you to make/change innovative business strategies and
investing cleverly while making beneficial decisions for your business.

• Improvised Management
One of the main reasons for using Big Data is that it improves management and saves time. It
analyzes and considers the business data that helps the business process to run efficiently. Big
Data develops SWOT (Strength, Weakness, Opportunity, and Threats) analysis.

• Hike in Sales
Big Data conducts connection analysis that helps your organization to improvement the sale. Big
Data uses the data to distribute your target audience with services that they need by analyzing
their needs. This, in turn, helps to increase your business revenue in the most gifted manner.
• Encouraging Personalization
Big Data plays a energetic role in improving Customer Relationship Management (CRM). It
provides better customization and personalization. Big Data provides better understanding of the
customers which helps you to advance business strategies with a personalized touch that will
take the attention of your consumers.
• Efficient Advertising
Big Data helps in understanding the customer behaviour better, it also helps in providing better
ideas for effective and efficient promotion. It helps in finding the best location and also provides
funding in driving engagement and sales through creative and unique advertising techniques that
helps in generating circulation to the website.
• Predicts the Buyer Behaviour
The analysis provided by the Big Data is helpful ias not only understanding the market but also
in sympathetic the behaviors of the consumers. The accuracy of the prediction of the buyers
behaviors helps in increasing the business through business strategies that are targeted in the
direction of the targeted audience. The judgements of the analysis of the data collected over time
also helps in predicting the achievement of the campaign and the consequences that it will
generate.

You might also like