Professional Documents
Culture Documents
Deployment: Cheat Sheet: Machine Learning With KNIME Analytics Platform
Deployment: Cheat Sheet: Machine Learning With KNIME Analytics Platform
BAGGING
Bagging: A method for training multiple
classification/regression models on different
randomly drawn subsets of the training data.
The final prediction is based on the
predictions provided by all the models, thus
reducing the chance of overfitting. Self-Organizing Tree Algorithm
Predictor Scorer Data Input Predictor Data Output (SOTA): A special Self-Organiz-
ing Map (SOM) neural network.
Tree Ensemble of Decision/Regression Trees: Its cell structure is grown using
Ensemble model of multiple decision/regres- a binary tree topology.
sion trees trained on different subsets of data.
Data subsets with less or equal rows and less Fuzzy c-Means: One of the
or equal columns are bootstrapped from the most widely used fuzzy
original training set. Final prediction is based on clustering algorithms. It works
a hard vote (majority rule) or soft-vote similarly to the k-Means
TRAINING
(averaging all probabilities or numeric algorithm, but it allows for data
predictions) on all involved trees. points to belong to more than
Random Forest of Decision/Regression Trees: BOOSTING one cluster, with different
degrees of membership.
Ensemble model of multiple decision/regres-
sion trees trained on different subsets of data. Boosting: A method for training a set of
Data subsets with the same number of rows classification/regression models iteratively. Resources
are bootstrapped from the original training set.
At each node, the split is performed on a
At each step, a new model is trained on the
• E-Books: Learn even more with the KNIME books. RECOMMENDATION ENGINES
prediction errors and added to the
subset of sqrt(x) features from the original x ensemble to improve the results from the EVALUATION From basic concepts in “KNIME Beginner’s Luck”,
input features. Final prediction is based on a previous model state, leading to higher to advanced concepts in “KNIME Advanced Luck”, Recommendation Engines: A set of
hard vote (majority rule) or soft-vote (averaging accuracy after each iteration. through to examples of real-world case studies in algorithms that use known information about
all probabilities or numerical predictions) on all Evaluation: Various scoring metrics for assessing model quality - in particular, a model’s predictive ability or propensity to error.
“Practicing Data Science”. Available for purchase user preferences to predict items of interest.
involved trees.
Gradient Boosted Regression Trees: at knime.com/knimepress
Ensemble model combining multiple Confusion Matrix: A representation of a classification Numeric Error Measures: Evaluation metrics Association Rules: The node
sequential simple regression trees into a task’s success through the count of matches and for numeric prediction models quantifying the • KNIME Blog: Engaging topics, challenges, reveals regularities in co-occur-
stronger model. The algorithm builds the mismatches between the actual and predicted classes, error size and direction. Common metrics industry news, and knowledge nuggets at rences of multiple products in
Custom Ensemble Model: model stagewise. At each iteration, a simple aka true positives, false negatives, false positives, and include RMSE, MAE, or R2. Most of these knime.com/blog large-scale transaction data
Combining different supervised regression tree is fitted to predict the true negatives. One class is arbitrarily selected as the metrics depend on the range of the target recorded at points-of-sale. Based
models to form a custom residuals of the current model, following the positive class. variable. on the a-priori algorithm, the
• KNIME Hub: Search, share, and collaborate on most frequent itemsets in the
ensemble model. The final gradient of the loss function. This leads to Accuracy Measures: Evaluation metrics for a classification
prediction can be based on an increasingly accurate and complex overall KNIME workflows, nodes, and components with dataset are used to generate
model calculated from the values in the confusion matrix, ROC Curve: A graphical representa-
majority vote as well as on the model. The same regression trees can also tion of the performance of a binary the entire KNIME community at hub.knime.com recommendation rules.
such as sensitivity and specificity, precision and recall, or
average or other functions of the be used for classification. overall accuracy. classification model with false
output results. positive rates on the x-axis and true • KNIME Forum: Join our global community and Collaborative Filtering: Based on
Cross-Validation: A model validation technique for positive rates on the y axis. Multiple engage in conversations at forum.knime.com the Alternating Least Squares
assessing how the results of a machine learning model will points for the curve are obtained for (ALS) technique, it produces
XGBoost: An optimized distributed library for machine learning models in the generalize to an independent dataset. A model is trained and
gradient boosting framework, designed to be highly efficient, flexible, and portable. different classification thresholds. • KNIME Server: The enterprise software for recommendations (filtering)
validated N times on different pairs of training set and test The area under the curve is the about the interests of a user by
It features regularization parameters to penalize complex models, effective set, both extracted from the original dataset. Some basic team-based collaboration, automation, manage-
handling of sparse data for better performance, parallel computation, and more metric value. comparing their current
statistics on the resulting N error or accuracy measures ment, and deployment of data science workflows preferences with those of
efficient memory usage. gives insights on overfitting and generalization. as analytical applications and services. Visit multiple users (collaborating).
knime.com/server for more information.