Practice Set Data Analytics

Data Analytics And Reporting(PCC-CSD503)
1. How many types of data are available in data analytics, and what are
they?
2. Types of Data Analytics: An Overview and Description of Each.
3. Describe EDA and its various forms.
4. Talk about the features of big data.
5. Explain what a data warehouse is and how business intelligence relates
to it.
6. Discuss the characteristics of data extraction.
7. Talk about the Data Stack.
8. Provide an instance of how data analytics is used.
9. Talk about the application of big data.
10.Give examples of the tools used in big data.
11.See how Hadoop makes sure that data storage is fault-tolerant.
12.Differentiate Between Cloud Technology and Big Data.
13.Talk about cloud computing instead of big data.
14.Talk about HDFS and its main function within the Hadoop ecosystem.
15.Give a concrete instance of MapReduce's data processing capabilities in
the context of Hadoop.
16.Describe the benefits of Hadoop.
17.Examine how business intelligence helps companies make data-driven
decisions.
18.List the Drawbacks of Hadoop.
19.Give an example of data deserialization in big data and explain why it's a
necessary step in the data processing process.
20.Describe the role of data mining in Business Intelligence.
21.Give the definition of business intelligence (BI) in relation to data
analytics.
22.Describe business intelligence's primary objective in terms of data
analytics.
23.Talk about a few popular data sources that are analyzed by BI tools.
24.Describe the distinctions between business intelligence and traditional
reporting.
25.Identify a crucial part of a business intelligence system that is used for
reporting and data visualization.
26.Describe the advantages of self-service BI tools for businesses.
27.In the context of business intelligence, describe the idea of OLAP (Online
Analytical Processing).
28.Describe Hadoop and the issues it resolves with processing large
amounts of data.
29.What is the main programming model that Hadoop uses to process
data?
30.Describe the two main Hadoop components.
31.Explain the functions of the Hadoop Distributed File System's NameNode
and DataNode (HDFS).
32.Explain the importance of Hadoop's MapReduce process' "shuffle and
sort" phase.
33.Describe the Hadoop concept of data locality.
34.List a few Hadoop substitutes in the big data ecosystem.
35.What role does Hadoop play in the scalability of big data processing?
36.Give an example of a few common Hadoop use cases for big data apps.
37.In the context of big data, distinguish between structured and
unstructured data.
38.What part does data preprocessing play in big data analytics?
39.Explain the data measurement scale.
40.Talk about Big Data Types.
41.Explain the Different Data Analytics Stages.
42.What is data analytics? Describe the significance of data analytics.
43.Describe the Procedures for Exploratory Data Analysis (EDA).
44.What does business intelligence mean? Give examples of business
intelligence's advantages.
45.Describe Hadoop. Analyze the Hadoop components.
46.Establish the Hadoop Ecosystem.
47.Talk about data serialization in relation to large data. Why does big data
processing require data serialization?
48.Name the main benefits of processing large amounts of data with
Hadoop and MapReduce.
49.Talk about the functions of the reducer and mapper in map reduce.
50.Analyze how data analytics are used in a variety of sectors, including e-
commerce, finance, and healthcare. How have operations and decision-
making in these sectors been changed by data analytics?
1. What is data analytics? a) Storing data for future use b) Analyzing data to
extract meaningful insights c) Creating data visualizations d) Data
collection and reporting
2. Which of the following is not a common data analytics technique? a)
Regression analysis b) Machine learning c) Descriptive statistics d)
Database management
3. What is the primary goal of data preprocessing in data analytics? a)
Finding hidden patterns in data b) Cleaning and transforming data for
analysis c) Generating data visualizations d) Collecting data from various
sources
4. Which statistical measure is used to describe the spread or dispersion of
data? a) Mean b) Median c) Variance d) Mode
5. Which of the following is an example of supervised learning in machine
learning? a) Clustering b) Regression c) Anomaly detection d) Principal
component analysis
6. What is the purpose of data normalization in data analytics? a) Reducing
the dimensionality of data b) Scaling data to a common range c) Filling
missing data with zeros d) Creating data visualizations
7. Which data visualization type is best suited for showing the distribution
of a single variable? a) Line chart b) Scatter plot c) Histogram d) Pie chart
8. What is the main difference between structured and unstructured data?
a) Structured data is stored in databases, while unstructured data is not.
b) Structured data is easy to analyze, while unstructured data is not. c)
Structured data is text-based, while unstructured data is numeric. d)
Structured data has a defined format, while unstructured data does not.
9. Which statistical test is commonly used to determine if there is a
significant difference between two or more groups? a) T-test b) Chi-
squared test c) ANOVA (Analysis of Variance) d) Pearson correlation
10.In data analytics, what is the term used to describe the process of
combining data from multiple sources to create a single, unified dataset?
a) Data visualization b) Data exploration c) Data integration d) Data
aggregation
11.What is the primary goal of data analytics?
A) Data collection
B) Data storage
C) Data visualization
D) Extracting valuable insights from data
12.Which of the following data types is typically quantitative in nature?

A) Names of products
B) Colors of cars
C) Age of customers
D) Types of fruits
13.Which statistical measure describes the average value of a dataset?

A) Median
B) Mode
C) Range
D) Mean
14.What is the process of transforming and cleaning data before analysis

called?
A) Data visualization
B) Data aggregation
C) Data wrangling
D) Data summarization
15.Which data visualization technique is used to display the distribution of a

dataset and identify outliers?
A) Pie chart
B) Scatter plot
C) Histogram
D) Bar chart
16.In a regression analysis, which variable is considered the dependent

variable?
A) Independent variable
B) Control variable
C) Dependent variable
D) Confounding variable
17.Which of the following is an example of supervised learning in machine

learning?
A) Image classification
B) Anomaly detection
C) Clustering
D) Association rule mining
18.What is the term for the process of finding patterns and relationships in
data?
A) Data cleaning
B) Data visualization
C) Data exploration
D) Data modeling
19.Which statistical test is used to determine if there is a significant

difference between the means of two or more groups?
A) T-test
B) Chi-squared test
C) ANOVA (Analysis of Variance)
D) Regression analysis
20.What is the primary goal of A/B testing in data analytics?

A) Data collection
B) Hypothesis testing
C) Data visualization
D) Model building and training
21.What is data analytics?
A. The process of collecting data
B. The process of cleaning and storing data
C. The process of analyzing data to gain insights
D. The process of visualizing data
22.Which of the following is not a type of data analytics?

A. Descriptive analytics
B. Predictive analytics
C. Prescriptive analytics
D. Diagnostic analytics
23.What is the primary goal of exploratory data analysis (EDA)?

A. Predict future events
B. Clean and preprocess data
C. Discover patterns and relationships in data
D. Build machine learning models
24.Which of the following is a key step in data preprocessing for analytics?

A. Visualization
B. Data collection
C. Data transformation
D. Hypothesis testing
25.In supervised machine learning, what is the role of the target variable?
A. It is the variable to be predicted.
B. It is a feature used to train the model.
C. It is not used in the modeling process.
D. It is the input to the model.
26.What is the main purpose of clustering in data analytics?

A. Predicting future values
B. Classifying data points into groups
C. Analyzing time series data
D. Visualizing data
27.Which statistical measure is used to assess the central tendency of a

dataset?
A. Standard deviation
B. Variance
C. Mean
D. Median
28.Which data visualization technique is best for showing the distribution of

a single numerical variable?
A. Pie chart
B. Histogram
C. Scatter plot
D. Bar chart
29.What does the term "data wrangling" refer to in data analytics?

A. The process of cleaning and transforming raw data
B. The process of collecting data
C. The process of analyzing data
D. The process of visualizing data
30.Which programming language is commonly used for data analysis and

machine learning?
A. Python
B. C++
C. Java
D. Ruby
31.What is the primary objective of Hadoop?
A. Real-time data processing
B. Storage and processing of large volumes of data
C. Streaming media content delivery
D. Database management
32.Which component of Hadoop is responsible for storing and managing

data in a distributed file system?
A. HDFS (Hadoop Distributed File System)
B. YARN (Yet Another Resource Negotiator)
C. MapReduce
D. Pig
33.What is the primary programming model for data processing in Hadoop?

A. SQL
B. MapReduce
C. NoSQL
D. Java
34.In Hadoop, what is the role of the Map phase in the MapReduce
framework?
A. Data splitting and sorting
B. Data aggregation
C. Data storage in HDFS
D. Data visualization
35.Which of the following is not a characteristic of Hadoop HDFS?

A. Data replication for fault tolerance
B. Block-based storage
C. High-speed real-time data processing
D. Scalability
36.What is the purpose of the YARN (Yet Another Resource Negotiator)

component in Hadoop?
A. Data storage
B. Data analysis
C. Resource management and job scheduling
D. Data visualization
37.Which Hadoop ecosystem project is designed for querying and analyzing

large datasets stored in Hadoop using SQL-like queries?
A. HBase
B. Pig
C. Hive
D. Mahout
38.What is the default storage unit size in HDFS?

A. 1 megabyte
B. 64 megabytes
C. 128 megabytes
D. 256 megabytes
39.Which Hadoop ecosystem component is used for processing large-scale,

complex data pipelines in a directed acyclic graph (DAG) structure?
A. HBase
B. Pig
C. Spark
D. Tez
40.What is the purpose of Hadoop Streaming in the context of Hadoop

MapReduce?
A. A data streaming service
B. A tool for analyzing network traffic
C. A way to use non-Java programs in MapReduce jobs
D. A method for real-time data processing
41.What is the defining characteristic of Big Data?
A. Large volume of data
B. High-velocity data streams
C. Data with a variety of formats
D. All of the above
42.Which of the following is an example of structured data in the context of

Big Data?
A. Social media posts
B. Sensor data from IoT devices
C. Customer transaction records
D. Image files
43.Which of the following V's is often used to describe the key

characteristics of Big Data?
A. Velocity, Variety, Value
B. Volume, Velocity, Variety
C. Velocity, Veracity, Value
D. Volume, Variety, Veracity
44.In the context of Big Data, what is data veracity?

A. The volume of data generated
B. The trustworthiness of data
C. The variety of data sources
D. The speed at which data is generated
45.What is the primary purpose of a data lake in the context of Big Data?
A. Data storage for structured data
B. Data storage for unstructured and semi-structured data
C. Real-time data processing
D. Data warehousing for historical data
46.Which programming model is commonly used for distributed data

processing in the Big Data ecosystem?
A. Java
B. Python
C. MapReduce
D. SQL
47.Which open-source distributed storage system is often used in Big Data

environments to handle large datasets?
A. Hadoop HDFS
B. MongoDB
C. Apache Kafka
D. Microsoft SQL Server
48.What is the primary goal of data preprocessing in the context of Big Data
analytics?
A. Reducing data volume
B. Ensuring data is clean and ready for analysis
C. Aggregating data into a single repository
D. Applying machine learning algorithms
49.What is the primary challenge associated with data integration in a Big
Data environment?
A. Data duplication
B. Data privacy concerns
C. Data loss during transfer
D. Lack of data variety
50.Which technology or technique is commonly used for handling real-time

data processing and stream analytics in Big Data systems?
A. Batch processing
B. MapReduce
C. Apache Spark
D. Hadoop

Practice Set Data Analytics

Uploaded by

Copyright:

Available Formats

You might also like

Practice Set Data Analytics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Practice Set Data Analytics

Uploaded by

Copyright:

Available Formats

Data Analytics And Reporting(PCC-CSD503)

12.Which of the following data types is typically quantitative in nature?

13.Which statistical measure describes the average value of a dataset?

14.What is the process of transforming and cleaning data before analysis

15.Which data visualization technique is used to display the distribution of a

16.In a regression analysis, which variable is considered the dependent

17.Which of the following is an example of supervised learning in machine

19.Which statistical test is used to determine if there is a significant

20.What is the primary goal of A/B testing in data analytics?

22.Which of the following is not a type of data analytics?

23.What is the primary goal of exploratory data analysis (EDA)?

24.Which of the following is a key step in data preprocessing for analytics?

26.What is the main purpose of clustering in data analytics?

27.Which statistical measure is used to assess the central tendency of a

28.Which data visualization technique is best for showing the distribution of

29.What does the term "data wrangling" refer to in data analytics?

30.Which programming language is commonly used for data analysis and

32.Which component of Hadoop is responsible for storing and managing

33.What is the primary programming model for data processing in Hadoop?

35.Which of the following is not a characteristic of Hadoop HDFS?

36.What is the purpose of the YARN (Yet Another Resource Negotiator)

37.Which Hadoop ecosystem project is designed for querying and analyzing

38.What is the default storage unit size in HDFS?

39.Which Hadoop ecosystem component is used for processing large-scale,

40.What is the purpose of Hadoop Streaming in the context of Hadoop

42.Which of the following is an example of structured data in the context of

43.Which of the following V's is often used to describe the key

44.In the context of Big Data, what is data veracity?

46.Which programming model is commonly used for distributed data

47.Which open-source distributed storage system is often used in Big Data

50.Which technology or technique is commonly used for handling real-time

You might also like