Unit-II

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

What is the definition of big data?

Big data refers to extremely large and diverse collections of structured, unstructured,
and semi-structured data that continues to grow exponentially over time. These
datasets are so huge and complex in volume, velocity, and variety, that traditional
data management systems cannot store, process, and analyze them.

What are the sources of data?


In short, the sources of data are physical or digital places where information is
stored in a data table, data object, or some other storage format.

Data can be gathered from two places: internal and external sources. The
information collected from internal sources is called “primary data,” while the
information gathered from outside references is called “secondary data.”

1. Statistical data sources


Statistical data sources are surveys and other statistical reports used for official
purposes. Here, people are asked several questions, which can be either qualitative
or quantitative. Qualitative data sources don’t use numbers, while quantitative data
do.

The data sampling method uses both kinds of statistical data. Usually, a sample
survey is used to do a statistical survey. In this method, sample data is collected
and then analyzed using statistical analysis plan and techniques. The surveys can
also be done using the questionnaire method.

1. Census data sources


According to this method, the data are taken from the census report that was
published earlier. It’s the opposite of statistical surveys. The Census method
closely examines all parts of the population during the research process. Here, the
data is collected over a certain amount of time, called the reference time. The
researchers do their research at a particular time and then analyze it to conclude.

Census is done in the country for official purposes. The respondents are asked
questions, which they answer. This interaction can take place in person or over the
phone. However, the census is a source of data that takes a lot of time and effort
because it involves the whole population.

Additional sources of data


In addition to the above data sources, other origins are also considered when
collecting data. These are what they are:

1. Internal sources of data


Internal data references are things like reports and records that are published within
the organization.

Internal data references are used to do primary research on a given topic. As a


researcher, you can go to internal sources to get information. All the work of the
study is easy for it.

Some of the different internal data are accounting resources, sales force reports,
internal experts, and miscellaneous reports. Practical business intelligence relies on
the synergy between analytics and reporting, where analytics uncovers valuable
insights, and reporting communicates these findings to stakeholders.

1. External sources of data


When data collection happens outside of the organization, it is called an external
data source. In every way, they are outside of the company. As a researcher, you
can work for external data collection.

The data from external origins is harder to gather because it is much more varied,
and there can be many of them. There are different groups into which external data
can be put. They are given below:

 Government publications
Researchers can get a massive amount of information from government sources.
Also, you can get much of this information for free on the Internet.

 Non-government publications
Researchers can also find industry-related information in non-government
publications. The only research problem with non-government publications is that
their data may sometimes be biased.

 Syndicate services
Some companies offer Syndicate services. As part of this, they collect and organize
the same marketing information for all their clients. Surveys, mail diary panels,
electronic services, wholesalers, industrial firms, retailers, etc., are ways they get
information from households.

Why is data quality important?


Data quality is essential for one main reason: You give customers the best experience when
you make decisions using accurate data. A great customer experience leads to happy
customers, brand loyalty, and higher revenue for your business. If you’re using poor-quality
data, you’re mostly guessing at what your customers want. Worse still, you might be actively
doing things your customers dislike.

Data quality benefits


The best way to understand the importance of data quality is to take a look at the benefits you
get when you introduce a data quality strategy.

 Lower mailing costs: Accurate customer data reduces the amount of undeliverable
mail. Less undeliverable mail saves you money in postage cost—no more resending
packages that didn’t make it to their destinations. You may even qualify for discounts
on postage rates from the U.S. Postal Service if you consistently use accurate and
correctly formatted addresses.

 Improved customer relations: Reliable data lets you get to know your customers. This
keeps you from sending messages they don’t want and helps you spot their needs in
advance. By meeting—and exceeding—customer expectations, you create goodwill
and a strong relationship with your brand.

 More consistent data: Organizations with several points of entry for their customers
often face the problem of inconsistent data across the organization. Inconsistent data
leads to duplicate records, missed opportunities for messaging, and departmental data
silos. Part of managing data quality is making data available across your organization
to open communication between teams and keep records consistent.

How does the system handle missing or incomplete data?

 To handle the missing values, one approach is deletion. However, this


method may result in information loss. Alternatively, imputation can be used to
replace missing values and mean, median, or mode can be employed for
imputation, helping to preserve the overall data structure.

What to do with missing data as a data analyst?


There are three main approaches to handle missing data: deletion, imputation, and
analysis. Deletion involves removing missing values or cases from your data set.
This can be a simple and fast solution, but it can also reduce your sample size,
introduce bias, or lose valuable information.

What are the 4 types of missing data?


There are four qualitatively distinct types of missing data. Missing data is
either: structurally missing, missing completely at random (MCAR), missing at
random, or nonignorable (also known as missing not at random).

What is data visualization?


Data visualization is the process of using visual elements like charts, graphs,
or maps to represent data. It translates complex, high-volume, or numerical
data into a visual representation that is easier to process. Data visualization
tools improve and automate the visual communication process for accuracy
and detail. You can use the visual representations to extract actionable
insights from raw data.

Why is data visualization important?


Modern businesses typically process large volumes of data from various data
sources, such as the following:

 Internal and external websites

 Smart devices

 Internal data collection systems

 Social media

But raw data can be hard to comprehend and use. Hence, data scientists
prepare and present data in the right context. They give it a visual form so
that decision-makers can identify the relationships between data and detect
hidden patterns or trends. Data visualization creates stories that advance
business intelligence and support data-driven decision-making and strategic
planning.

What are the benefits of data visualization?


Some benefits of data visualization are as follows:
Strategic decision-making

Key stakeholders and top management use data visualization to interpret


data meaningfully. They save time through faster data analysis and the ability
to visualize the bigger picture. For example, they can identify patterns,
discover trends, and gain insights to remain ahead of the competition.

Improved customer service

Data visualization highlights customer needs and wants through graphical


representation. You can identify gaps in your customer service, strategically
improve products or services, and reduce operational inefficiencies.

Increased employee engagement

Data visualization techniques are useful for communicating data analysis


results to a large team. The entire group can visualize data together to
develop common goals and plans. They can use visual analytics to measure
goals and progress and improve team motivation. For example, a sales team
works together to increase the height of their sales bar chart in one quarter.

What Are The different Types of Data in Business Analytics?

Types of Data

Structured Data

Structured data is a type of data that is organized and easily managed using traditional
data management tools such as spreadsheets, databases, or tables. Structured data is
typically quantitative and numeric in nature, meaning that it consists of numbers,
percentages, and other numerical values. Because of its organized nature, structured
data is relatively easy to analyze using statistical methods such as regression analysis
or correlation analysis.

Unstructured Data

Unstructured data is data that does not have a predefined format or organization,
making it difficult to manage using traditional data management tools. Examples of
unstructured data include social media posts, emails, images, and videos. Because of
its unstructured nature, unstructured data is typically qualitative in nature, meaning
that it is descriptive and narrative in nature. Analyzing unstructured data requires the
use of advanced analytics techniques such as natural language processing (NLP) or
sentiment analysis.
Semi-Structured Data

Semi-structured data is a type of data that has elements of both structured and
unstructured data. This type of data includes information that is partially organized,
but not to the extent that it can be classified as structured data. Examples of semi-
structured data include XML and JSON files, which have some organization but also
contain elements of unstructured data. Analyzing semi-structured data typically
requires a combination of traditional data management tools and advanced analytics
techniques.

Big Data

Big Data is a term used to describe large and complex data sets that cannot be
processed using traditional data management tools. Big Data includes a variety of data
types, including structured, unstructured, and semi-structured data. The main
challenge of analyzing Big Data is its volume, as the amount of data is too large to be
analyzed manually. Analyzing Big Data requires the use of specialized tools and
techniques such as Hadoop or Spark.

What is the PLS-SEM method?


The partial least squares path modeling or partial least squares structural equation
modeling (PLS-PM, PLS-SEM) is a method for structural equation modeling that
allows estimation of complex cause-effect relationships in path models with latent
variables.

What are the benefits of PLS-SEM?


The PLS-SEM has the following advantages: (1) it can realize the integrated
application of multiple data analysis methods, which can both find the functional
relationship between independent variables and dependent variables through the
data and use the model rows to make predictions,

You might also like