Big Data Dataset Categories Pattern Identification

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

actionable high-volume measures of central tendency

trivial association rules high-velocity numerical summaries measures of variation or dispersion


inexplicable Big Data Dataset Categories high-variety measures of association
Apriori algorithm high-veracity
assign k-means
clustering Pattern Identification high-value Chebyshev’s inequality rule
update stages rules
reassignment empirical rule
bag of words text analytics
inverse document frequency (IDF)
term frequency
term frequency inverse document frequency (TFIDF) box and whisker plot
cosine distance
plots quantile-quantile (q-q) plot forward selection
n-grams Exploratory Data Analysis lattice plot backward elimination
token/term named entity extraction
document text representation decision tree induction

corpus dimensionality reduction feature extraction

global data reduction


binning
contextual outlier types data discretization
clustering
collective Outlier Detection
univariate analysis
parametric statistical techniques
bivariate analysis
non-parametric analysis types
multivariate analysis
distance-based/unsupervised
time series analysis
supervised
k-means semi-supervised
cluster-based local outlier factors (CBLOF) clustering
interquartile range (IQR)
non-clustering
mean
median
sensitivity mode
specificity robustness
recall range
precision
Module 6
quantile
accuracy Big Data Analysis & Science Lab Statistics Mathematics
quintile
Model Evaluation Measures
error rate quartile
f-score percentile
confusion matrix population
cross-validation frequency
bias
bias-variance probability
variance
discrete
standard deviation standard error
continuous
z-score statistical estimator
sampling
distributions confidence interval
binomial
skewness
machine learning algorithms geometric
Modeling
linear regression statistical models normal
mean squared error predictive modeling uniform
error term feature vector
residual instance/example descriptive statistics
coefficient of determination R2 target linera regression inferential statistics
standard error of estimate logistic regression null hypothesis
concept Statistics Analysis correlation
decision trees alternative hypothesis
covariance
pre-pruning statistical significance
hypothesis testing
post-pruning bar chart p-value
nominal
feature splitting line graph type I error
Classification ordinal
entropy histogram type II error
binary
information gain frequency polygon Statistics Variable Categories quantitative
critical region
classification rules scatter plot independent
one rule (1R) algorithm
rule-based model stem and leaf plot random
k nearest neighbor (kNN)
naïve Bayes Visualization cross-tabulation qualitative
Bayes’ theorem box and whisker plot
Laplace smoothing quantile-quantile (q-q) plot
lattice plot

Module 6: Big Data Analysis & Science Lab Big Data Science Certified Professional (BDSCP) Program
Official Mind Map Supplement Copyright © Arcitura Education Inc. www.arcitura.com

You might also like