Practice Set Data Analytics

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Data Analytics And Reporting(PCC-CSD503)

1. How many types of data are available in data analytics, and what are
they?
2. Types of Data Analytics: An Overview and Description of Each.
3. Describe EDA and its various forms.
4. Talk about the features of big data.
5. Explain what a data warehouse is and how business intelligence relates
to it.
6. Discuss the characteristics of data extraction.
7. Talk about the Data Stack.
8. Provide an instance of how data analytics is used.
9. Talk about the application of big data.
10.Give examples of the tools used in big data.
11.See how Hadoop makes sure that data storage is fault-tolerant.
12.Differentiate Between Cloud Technology and Big Data.
13.Talk about cloud computing instead of big data.
14.Talk about HDFS and its main function within the Hadoop ecosystem.
15.Give a concrete instance of MapReduce's data processing capabilities in
the context of Hadoop.
16.Describe the benefits of Hadoop.
17.Examine how business intelligence helps companies make data-driven
decisions.
18.List the Drawbacks of Hadoop.
19.Give an example of data deserialization in big data and explain why it's a
necessary step in the data processing process.
20.Describe the role of data mining in Business Intelligence.
21.Give the definition of business intelligence (BI) in relation to data
analytics.
22.Describe business intelligence's primary objective in terms of data
analytics.
23.Talk about a few popular data sources that are analyzed by BI tools.
24.Describe the distinctions between business intelligence and traditional
reporting.
25.Identify a crucial part of a business intelligence system that is used for
reporting and data visualization.
26.Describe the advantages of self-service BI tools for businesses.
Data Analytics And Reporting(PCC-CSD503)
27.In the context of business intelligence, describe the idea of OLAP (Online
Analytical Processing).
28.Describe Hadoop and the issues it resolves with processing large
amounts of data.
29.What is the main programming model that Hadoop uses to process
data?
30.Describe the two main Hadoop components.
31.Explain the functions of the Hadoop Distributed File System's NameNode
and DataNode (HDFS).
32.Explain the importance of Hadoop's MapReduce process' "shuffle and
sort" phase.
33.Describe the Hadoop concept of data locality.
34.List a few Hadoop substitutes in the big data ecosystem.
35.What role does Hadoop play in the scalability of big data processing?
36.Give an example of a few common Hadoop use cases for big data apps.
37.In the context of big data, distinguish between structured and
unstructured data.
38.What part does data preprocessing play in big data analytics?
39.Explain the data measurement scale.
40.Talk about Big Data Types.
41.Explain the Different Data Analytics Stages.
42.What is data analytics? Describe the significance of data analytics.
43.Describe the Procedures for Exploratory Data Analysis (EDA).
44.What does business intelligence mean? Give examples of business
intelligence's advantages.
45.Describe Hadoop. Analyze the Hadoop components.
46.Establish the Hadoop Ecosystem.
47.Talk about data serialization in relation to large data. Why does big data
processing require data serialization?
48.Name the main benefits of processing large amounts of data with
Hadoop and MapReduce.
49.Talk about the functions of the reducer and mapper in map reduce.
50.Analyze how data analytics are used in a variety of sectors, including e-
commerce, finance, and healthcare. How have operations and decision-
making in these sectors been changed by data analytics?
Data Analytics And Reporting(PCC-CSD503)

1. What is data analytics? a) Storing data for future use b) Analyzing data to
extract meaningful insights c) Creating data visualizations d) Data
collection and reporting
2. Which of the following is not a common data analytics technique? a)
Regression analysis b) Machine learning c) Descriptive statistics d)
Database management
3. What is the primary goal of data preprocessing in data analytics? a)
Finding hidden patterns in data b) Cleaning and transforming data for
analysis c) Generating data visualizations d) Collecting data from various
sources
4. Which statistical measure is used to describe the spread or dispersion of
data? a) Mean b) Median c) Variance d) Mode
5. Which of the following is an example of supervised learning in machine
learning? a) Clustering b) Regression c) Anomaly detection d) Principal
component analysis
6. What is the purpose of data normalization in data analytics? a) Reducing
the dimensionality of data b) Scaling data to a common range c) Filling
missing data with zeros d) Creating data visualizations
7. Which data visualization type is best suited for showing the distribution
of a single variable? a) Line chart b) Scatter plot c) Histogram d) Pie chart
8. What is the main difference between structured and unstructured data?
a) Structured data is stored in databases, while unstructured data is not.
b) Structured data is easy to analyze, while unstructured data is not. c)
Structured data is text-based, while unstructured data is numeric. d)
Structured data has a defined format, while unstructured data does not.
9. Which statistical test is commonly used to determine if there is a
significant difference between two or more groups? a) T-test b) Chi-
squared test c) ANOVA (Analysis of Variance) d) Pearson correlation
Data Analytics And Reporting(PCC-CSD503)
10.In data analytics, what is the term used to describe the process of
combining data from multiple sources to create a single, unified dataset?
a) Data visualization b) Data exploration c) Data integration d) Data
aggregation
11.What is the primary goal of data analytics?
A) Data collection
B) Data storage
C) Data visualization
D) Extracting valuable insights from data

12.Which of the following data types is typically quantitative in nature?


A) Names of products
B) Colors of cars
C) Age of customers
D) Types of fruits

13.Which statistical measure describes the average value of a dataset?


A) Median
B) Mode
C) Range
D) Mean

14.What is the process of transforming and cleaning data before analysis


called?
A) Data visualization
B) Data aggregation
C) Data wrangling
D) Data summarization

15.Which data visualization technique is used to display the distribution of a


dataset and identify outliers?
Data Analytics And Reporting(PCC-CSD503)
A) Pie chart
B) Scatter plot
C) Histogram
D) Bar chart

16.In a regression analysis, which variable is considered the dependent


variable?
A) Independent variable
B) Control variable
C) Dependent variable
D) Confounding variable

17.Which of the following is an example of supervised learning in machine


learning?
A) Image classification
B) Anomaly detection
C) Clustering
D) Association rule mining

18.What is the term for the process of finding patterns and relationships in
data?
A) Data cleaning
B) Data visualization
C) Data exploration
D) Data modeling

19.Which statistical test is used to determine if there is a significant


difference between the means of two or more groups?
A) T-test
B) Chi-squared test
C) ANOVA (Analysis of Variance)
Data Analytics And Reporting(PCC-CSD503)
D) Regression analysis

20.What is the primary goal of A/B testing in data analytics?


A) Data collection
B) Hypothesis testing
C) Data visualization
D) Model building and training
21.What is data analytics?
A. The process of collecting data
B. The process of cleaning and storing data
C. The process of analyzing data to gain insights
D. The process of visualizing data

22.Which of the following is not a type of data analytics?


A. Descriptive analytics
B. Predictive analytics
C. Prescriptive analytics
D. Diagnostic analytics

23.What is the primary goal of exploratory data analysis (EDA)?


A. Predict future events
B. Clean and preprocess data
C. Discover patterns and relationships in data
D. Build machine learning models

24.Which of the following is a key step in data preprocessing for analytics?


A. Visualization
B. Data collection
C. Data transformation
D. Hypothesis testing
Data Analytics And Reporting(PCC-CSD503)
25.In supervised machine learning, what is the role of the target variable?
A. It is the variable to be predicted.
B. It is a feature used to train the model.
C. It is not used in the modeling process.
D. It is the input to the model.

26.What is the main purpose of clustering in data analytics?


A. Predicting future values
B. Classifying data points into groups
C. Analyzing time series data
D. Visualizing data

27.Which statistical measure is used to assess the central tendency of a


dataset?
A. Standard deviation
B. Variance
C. Mean
D. Median

28.Which data visualization technique is best for showing the distribution of


a single numerical variable?
A. Pie chart
B. Histogram
C. Scatter plot
D. Bar chart

29.What does the term "data wrangling" refer to in data analytics?


A. The process of cleaning and transforming raw data
B. The process of collecting data
C. The process of analyzing data
D. The process of visualizing data
Data Analytics And Reporting(PCC-CSD503)

30.Which programming language is commonly used for data analysis and


machine learning?
A. Python
B. C++
C. Java
D. Ruby
31.What is the primary objective of Hadoop?
A. Real-time data processing
B. Storage and processing of large volumes of data
C. Streaming media content delivery
D. Database management

32.Which component of Hadoop is responsible for storing and managing


data in a distributed file system?
A. HDFS (Hadoop Distributed File System)
B. YARN (Yet Another Resource Negotiator)
C. MapReduce
D. Pig

33.What is the primary programming model for data processing in Hadoop?


A. SQL
B. MapReduce
C. NoSQL
D. Java

34.In Hadoop, what is the role of the Map phase in the MapReduce
framework?
A. Data splitting and sorting
B. Data aggregation
C. Data storage in HDFS
D. Data visualization
Data Analytics And Reporting(PCC-CSD503)

35.Which of the following is not a characteristic of Hadoop HDFS?


A. Data replication for fault tolerance
B. Block-based storage
C. High-speed real-time data processing
D. Scalability

36.What is the purpose of the YARN (Yet Another Resource Negotiator)


component in Hadoop?
A. Data storage
B. Data analysis
C. Resource management and job scheduling
D. Data visualization

37.Which Hadoop ecosystem project is designed for querying and analyzing


large datasets stored in Hadoop using SQL-like queries?
A. HBase
B. Pig
C. Hive
D. Mahout

38.What is the default storage unit size in HDFS?


A. 1 megabyte
B. 64 megabytes
C. 128 megabytes
D. 256 megabytes

39.Which Hadoop ecosystem component is used for processing large-scale,


complex data pipelines in a directed acyclic graph (DAG) structure?
A. HBase
B. Pig
Data Analytics And Reporting(PCC-CSD503)
C. Spark
D. Tez

40.What is the purpose of Hadoop Streaming in the context of Hadoop


MapReduce?
A. A data streaming service
B. A tool for analyzing network traffic
C. A way to use non-Java programs in MapReduce jobs
D. A method for real-time data processing
41.What is the defining characteristic of Big Data?
A. Large volume of data
B. High-velocity data streams
C. Data with a variety of formats
D. All of the above

42.Which of the following is an example of structured data in the context of


Big Data?
A. Social media posts
B. Sensor data from IoT devices
C. Customer transaction records
D. Image files

43.Which of the following V's is often used to describe the key


characteristics of Big Data?
A. Velocity, Variety, Value
B. Volume, Velocity, Variety
C. Velocity, Veracity, Value
D. Volume, Variety, Veracity

44.In the context of Big Data, what is data veracity?


A. The volume of data generated
Data Analytics And Reporting(PCC-CSD503)
B. The trustworthiness of data
C. The variety of data sources
D. The speed at which data is generated

45.What is the primary purpose of a data lake in the context of Big Data?
A. Data storage for structured data
B. Data storage for unstructured and semi-structured data
C. Real-time data processing
D. Data warehousing for historical data

46.Which programming model is commonly used for distributed data


processing in the Big Data ecosystem?
A. Java
B. Python
C. MapReduce
D. SQL

47.Which open-source distributed storage system is often used in Big Data


environments to handle large datasets?
A. Hadoop HDFS
B. MongoDB
C. Apache Kafka
D. Microsoft SQL Server

48.What is the primary goal of data preprocessing in the context of Big Data
analytics?
A. Reducing data volume
B. Ensuring data is clean and ready for analysis
C. Aggregating data into a single repository
D. Applying machine learning algorithms
Data Analytics And Reporting(PCC-CSD503)
49.What is the primary challenge associated with data integration in a Big
Data environment?
A. Data duplication
B. Data privacy concerns
C. Data loss during transfer
D. Lack of data variety

50.Which technology or technique is commonly used for handling real-time


data processing and stream analytics in Big Data systems?
A. Batch processing
B. MapReduce
C. Apache Spark
D. Hadoop

You might also like