Professional Documents
Culture Documents
Business Analytics and Data Mining Modeling Using R
Business Analytics and Data Mining Modeling Using R
b. names(df)
c. df.names()
d. names(“df”)
4. Which of the following data is put into a formula to produce commonly accepted results
a. Raw.
b. Processed.
c. Synchronized.
d. All of the above.
5. What would you use to compare the frequency distributions of more than one set of data?
a. Box plots.
b. Frequency distribution.
c. Frequency polygon.
d. Line graph.
6. Which of them is the best considered for prediction?
a. Linear regression
b. Logistic regression
c. CART
d. Naïve Bayes
7. Which of the following metrics measures the ‘goodness of fit’ of a regression model?
a. Mean absolute deviation
b. Root mean squared error
c. R-squared
d. The total sum of squared errors
8. Which statement is true about prediction problems?
a. The output attribute must be categorical.
b. The output attribute must be numeric.
c. The resultant model is designed to determine future outcomes.
d. The resultant model is designed to classify current behavior.
9. Which of the following assumptions of multiple linear regression can be relaxed in data mining?
a. Noise follows a normal distribution
b. Observations are independent
c. Linear relationship holds true
d. Heteroscedasticity
It is a way to make the categorical variable into a series of dichotomous variables (variables that
can have a value of zero or one only.) ... You can select any level of the categorical variable as the
reference level.
11. What is Partitioning and Describe its types.
Partitioning is the process of writing the hard drive sectors that
will make up the partition table. It contains information on the
partition, including sector size, position with respect to the
primary partition, types of partitions present, operating systems
installed, etc. When a partition is created, it is given a volume
name, which allows it to be easily identified.
There are three types of partitions: primary partitions, extended
partitions and logical drives.
-A primary partition is a partition on which you can install an operating system. A
primary partition with an operating system installed on it is used when the
computer starts to load the OS.
-An extended partition is a partition that can be divided into additional logical
drives. Unlike a primary partition, you don't need to assign it a drive letter and
install a file system. Instead, you can use the operating system to create an
additional number of logical drives within the extended partition.
-A logical drive is a drive space that is logically created on top of a physical hard
disk drive. A logical drive is a separate partition with its own parameters and
functions, and it operates independently. A logical drive can also be called a
logical drive partition or logical disk partition
12. What do you mean by Dimension Reduction Techniques?Describe any two Dimension Reduction
Techniques.
DIMensionality reduction refers to techniques that reduce the number of
input variables in a dataset.
More input features often make a predictive modeling task more challenging to
model, more generally referred to as the curse of dimensionality.
High-dimensionality statistics and dimensionality reduction techniques are
often used for data visualization. Nevertheless these techniques can be used in
applied machine learning to simplify a classification or regression dataset in
order to better fit a predictive model.
Principal Component Analysis for Breast Cancer Data with R and Python
13. Define Performance Metrics. What is the need for-performance metrics? What are the types of
Performance Metrics based on classification matrix?
PERFORMANCE METRICS: -
Productivity, profit margin, scope and cost are some examples of performance metrics that a
business can track to determine if target objectives and goals are being met. There are different areas
of a business, and each area will have its own key performance metrics.
NEED FOR PERFOMANCE METRICS: -
Performance metrics are used to measure the behavior, activities, and performance of a
business. This should be in the form of data that measures required data within a range, allowing a
basis to be formed supporting the achievement of overall business goals.
The most commonly used Performance metrics for classification problem are as follows,
Accuracy.
Confusion Matrix.
Precision, Recall, and F1 score.
ROC AUC.
Log-loss.
14. Define Data Mining. Describe all Phases in a typical Data Mining effort.
15. What do you mean by Datasets? Describe all 4 types of Datasets.
2. The sales of ice cream versus the temperature on that day. Here the two variables used are
ice cream and temperature.
(Note: In case, if you have one set of data alone say, temperature, then it is called the univariate
dataset)