Organization of The Examination: Theoretical Part - Duration: 1h

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

Organization of the examination

The examination is organized in two parts :

Theoretical part - duration : 1h


You don’t have access to course notes.
You need:
- A pen
- A calculator

Type of questions:
Open questions and multiple-choice questions. Questions are similar (but not
formulated exactly like this) to the ones in this document (blue part).
You answer in French or in English, as you wish, with your own words.

Practical part - duration 2h


You have access to course notes (electronic version and/or written notes, Tableau
workbooks, SAS projects) as well as Internet, but of course, no communication is
allowed.
Google Drive, OneDrive and other types of drive cannot be used and will be considered
as cheating. Please bring your files on a flash drive. I can lend you one if needed.
You do not work on your own computer but on computers of the IT room.
Your session is registered all along.

Type of questions:
You have to know how to start SAS EM in English, this is part of the exam.

EXCEL : (See exercises)

TABLEAU : (See examples on lol@, additional exercises)Questions might be :


- Reproduce given plots (see Exercises in the Slides + additional exercises)
- Produce a plot on the basis of a question (without a given plot)
o Example: analyze the sales and profit of each product category with
a Bubble Map. Focus only on Alabama.
- Dashboard to create
SAS: Questions might be :
- A complete analysis with questions about your results and your choices of
parameters. Either I tell you which analysis to do, or you have to decide the
best one according to the objective.
- Examples:
o What is the best tool to predict XX? Which misclassification
percentage do you get?
o What are the most relevant inputs?
o What is the impact of input XX on XX? …
o Use a Decision tree to predict XX.
o Which steps are required to prepare dataset XX for regression? How
many missing values do you have?

Examination content
Theory
THEORY: Principles of supervised/predictive methods
1. Define the two families of methods in Data Mining. What is the difference between them?
2. Explain the two phases of any predictive method.
3. What are the 3 features required in any predictive method? Explain them.
4. What are the three types of prediction? How can we transform one type in another? Which one is
the richest?
5. Explain over-fitting.
6. How to adjust model complexity (in general)?
7. What are the two measures used to assess the performance of a predictive method? Define them.
8. Compute the misclassification and the average error on an example of dataset with actual and
predicted target values.
9. Define the roles of the training set, the validation set and the test set.

THEORY : MBA
10. Define Market Basket Analysis
11. Define and give the interpretation of the support, confidence and lift of a rule
12. Compute support, confidence and lift of a rule for a given example.

THEORY : Clustering
13. Explain the K-means algorithm
14. Explain why you need to standardize variables for the K-means algorithm
15. What are the possible failings of clustering and how to deal with them?
16. What is the Cubic Clustering Criterion and why is it used for?

THEORY: Decision Trees


17. Explain the prediction rules provided by decision trees (on the basis of an example).
18. Explain how to select relevant inputs in Decision Trees. (split-search)
19. Compute the entropy, Gini index, misclassification for a provided example of input and split point.
Be able to compare inputs on the basis of their purity or impurity measures. Explain how these
measures are used in the split search algorithm.
20. Explain how missing values are handled in Decision Trees
21. Explain how to adjust model complexity in Decision Trees (pruning).
22. What are the dangers when the proportion of outcomes/values of the target differ greatly, i.e. when
you try to predict rare outcomes?

THEORY: Regression
23. Detail the prediction formula of linear and logistic regression. Compute, for a given individual and
given regression coefficients, the predicted value or the predicted odds ratio.
24. Explain how to interpret the regression coefficients in linear and logistic regression.
25. Explain how to select relevant inputs in regression. Detail one of the methods.
26. Explain how to adjust model complexity in regression.
27. Give 6 possible issues in linear regression and explain how to deal with them.

Practice – Be able to
Excel
1. Use relative and absolute references
2. Import a txt file or a csv file in Excel
3. Compute formulas
4. Plot a histogram using the FREQUENCE function
5. Create a subsample of your data
6. Use the RECHERCHEV/VLOOKUP function
7. Use basic and advanced filters
8. Use BD formulas
9. Build a pivot table

Tableau
10. Import data in Tableau and choose the appropriate type of join
11. Use colors, labels, tooltips, annotations.
12. Use dual axis and blended axis plots
13. Change the type of measure : SUM, AVG, MIN, MAX, COUNT, COUNTD,MEDIAN,….
14. Change the type of quick calculation : Percent of Total, Percent Difference, Running Total…
15. Compute row level and aggregated formulas and choose appropriately between them
16. Use If-THEN formulas
17. Use CASE formulas
18. Build Histogram and cumulative curve, use a parameter for the bin size
19. Use quick filters based on dimensions or measures
20. Build heat maps, tree maps, bubble maps and word clouds
21. Edit locations of a map, build a filled map, a symbol map and superimpose both
22. Build pie charts
23. Build a scatter plot
24. Build a What-if Analysis
25. Use parameters for various analysis (to change visualization,…)
26. Build a Top N analysis
27. Use FIXED and INCLUDE LOD appropriately to solve complex problems
28. Build a dashboard
29. Add a URL action
30. Make the plots interact with each other on a dashboard, apply a filter on all plots

Data preparation with SAS EM


31. Start SAS EM in English
32. Create a new project, new diagram, new library and new data source
33. Import an external dataset
34. Explore datasets and take appropriate decisions for subsequent analysis
35. Use the Filter Node
36. Use the Replacement node
37. Use the Impute node
38. Use the Transform node

MBA with SAS EM


39. Check data capacity
40. Build a Market Basket analysis and interpret the results

Clustering with SAS EM


41. Use the cluster node to run a K-means algorithm, analyze and interpret the results

Decisions Trees with SAS EM


42. Prepare the dataset for decision trees
43. Use the interactive tool to build a decision tree
44. Build the optimal tree autonomously, change the default parameters,…
45. Analyze and interpret the results of a Decision Tree
46. Use a decision tree to score new cases

Regression with SAS EM


47. Prepare the dataset for regression
48. Use the regression node to perform stepwise, backward or forward regression, optimize
complexity based on an appropriate measure
49. Analyze the results of a Regression (interpret coefficients, iteration plot, assessment measure)
50. Use a regression to score new cases
51. Deal with outliers/ skewed distributions, missing values, categorical inputs with many variables

You might also like