Professional Documents
Culture Documents
Review - Big Data Key
Review - Big Data Key
c 8.
____ Going through a database to fix formatting issues, correct inaccurate entries, etc., is called.
a. data persistence c. data scrubbing
b. data exhaust d. falsifying data
f 9.
____ (T/F) Usable data is always useful data.
t 10.
____ (T/F) Useful data is always usable data.
#s 12-21: Matching
d 11. digital exhaust
____ a. a quantitative (numeric) measure of data
f 12. deepfake
____ b. a label used to describe and categorize metrics ; non-numeric
____
a 13. metric c. crowdsourcing gets you close to the right answer
c 14. central-limit theorem
____ d. data left behind as we use the internet
i 15. cluster analysis
____ e. data point that is very different from most other data points
b 16. dimension
____ f. use artificial intelligence to create visual or audio of fake events
g 17. filter bubble
____ g. search algorithm only lists what it thinks you agree with
e 18. outlier
____ h. an artistic visual presentation of data
h 19. viz
____ i. finding groups of data that are similar
j 20. regression
____ j. using data trends to predict how one factor affects another
Name: ______________________________________________________________________________ Date:__________ Period:_________
OnRamps Computer Science
Assignment: Big Data Concepts
volume
21. Big Data sets are defined by high _____________________, velocity
high ________________________, variety
and high _____________________.
Machine learning
22. _________________________________________ is when a computer uses data to craft its own behavior. Two types:
Supervised learning
a. _______________________________________________ is when the computer is given inputs (such as baseball
signs) and outputs and it develops an algorithm to make predictions (such as when a runner will try
to steal a base).
Unsupervised learning
b. _______________________________________________ is when the outputs are unknown so the computer is
given data and asked to discover patterns and relationships in the data.
23. Circle all of the following which are file types used to store large amounts of data:
24. When using a search engine, we often simply type in terms we want included in the search.
" " [quotes]
a. What symbolism is used to indicate the search engine should find an exact term? _____________________
b. What symbol is used to indicate the search engine should exclude a term? _________________________
- [dash or minus sign]
25. Identify the different types of data analysis or visualization represented below.
Cluster analysis
Linear regression Jitter
Heat map
Outlier
Note: This review is only meant to cover vocabulary and basic concepts. You may also see questions from the
readings, practice quizzes, and presentations included in the Big Data unit.