Professional Documents
Culture Documents
Organization of The Examination: Theoretical Part - Duration: 1h
Organization of The Examination: Theoretical Part - Duration: 1h
Organization of The Examination: Theoretical Part - Duration: 1h
Type of questions:
Open questions and multiple-choice questions. Questions are similar (but not
formulated exactly like this) to the ones in this document (blue part).
You answer in French or in English, as you wish, with your own words.
Type of questions:
You have to know how to start SAS EM in English, this is part of the exam.
Examination content
Theory
THEORY: Principles of supervised/predictive methods
1. Define the two families of methods in Data Mining. What is the difference between them?
2. Explain the two phases of any predictive method.
3. What are the 3 features required in any predictive method? Explain them.
4. What are the three types of prediction? How can we transform one type in another? Which one is
the richest?
5. Explain over-fitting.
6. How to adjust model complexity (in general)?
7. What are the two measures used to assess the performance of a predictive method? Define them.
8. Compute the misclassification and the average error on an example of dataset with actual and
predicted target values.
9. Define the roles of the training set, the validation set and the test set.
THEORY : MBA
10. Define Market Basket Analysis
11. Define and give the interpretation of the support, confidence and lift of a rule
12. Compute support, confidence and lift of a rule for a given example.
THEORY : Clustering
13. Explain the K-means algorithm
14. Explain why you need to standardize variables for the K-means algorithm
15. What are the possible failings of clustering and how to deal with them?
16. What is the Cubic Clustering Criterion and why is it used for?
THEORY: Regression
23. Detail the prediction formula of linear and logistic regression. Compute, for a given individual and
given regression coefficients, the predicted value or the predicted odds ratio.
24. Explain how to interpret the regression coefficients in linear and logistic regression.
25. Explain how to select relevant inputs in regression. Detail one of the methods.
26. Explain how to adjust model complexity in regression.
27. Give 6 possible issues in linear regression and explain how to deal with them.
Practice – Be able to
Excel
1. Use relative and absolute references
2. Import a txt file or a csv file in Excel
3. Compute formulas
4. Plot a histogram using the FREQUENCE function
5. Create a subsample of your data
6. Use the RECHERCHEV/VLOOKUP function
7. Use basic and advanced filters
8. Use BD formulas
9. Build a pivot table
Tableau
10. Import data in Tableau and choose the appropriate type of join
11. Use colors, labels, tooltips, annotations.
12. Use dual axis and blended axis plots
13. Change the type of measure : SUM, AVG, MIN, MAX, COUNT, COUNTD,MEDIAN,….
14. Change the type of quick calculation : Percent of Total, Percent Difference, Running Total…
15. Compute row level and aggregated formulas and choose appropriately between them
16. Use If-THEN formulas
17. Use CASE formulas
18. Build Histogram and cumulative curve, use a parameter for the bin size
19. Use quick filters based on dimensions or measures
20. Build heat maps, tree maps, bubble maps and word clouds
21. Edit locations of a map, build a filled map, a symbol map and superimpose both
22. Build pie charts
23. Build a scatter plot
24. Build a What-if Analysis
25. Use parameters for various analysis (to change visualization,…)
26. Build a Top N analysis
27. Use FIXED and INCLUDE LOD appropriately to solve complex problems
28. Build a dashboard
29. Add a URL action
30. Make the plots interact with each other on a dashboard, apply a filter on all plots