Professional Documents
Culture Documents
Group 10
Group 10
RapidMiner (Part-02)
Presented by:
Shabit Mahmud(1901044)
Md. Mostafizur Rahaman(1901024)
MD. Al Morsaline(1901020)
D. M. Khalid Mahmud(1901004)
Outlines
• 15.5 SAMPLING AND MISSING VALUE TOOLS
• 15.6 OPTIMIZATION TOOLS
• 15.7 INTEGRATION WITH R
• Conclusion
15.5 SAMPLING AND
MISSING VALUE TOOLS
Presented By:
Shabit Mahmud (ID-1901044)
Sampling Tool in
Rapidminer
Drag and drop the dataset first.
Search the sampling option in the operator option
and drag and drop the option.
We can also change the sampling type also such as
relative, absolute etc.
If we select the sampling type relative and
sampling rate 0.6, 60% data will be retrieved then.
After running, the following interface will come.
Define the sampling rate and press the run button.
After running, we can see the result like this.
Missing Value Tool in
Rapidminer
Drag and drop the dataset first.
After running, from the statistics option we can see
the number of missing values for a column.
Now select the attribute option and drag and drop it.
If we run it, we can see the values of the selected
attribute.
Search replace missing values in the operator
option.
Drag and drop the replace missing value option and select
the option by which we want to replace the missing values.
If we select average, the missing values of the selected
attribute will be replaced by the average value.
After running, we can see there is no missing
value for the selected attribute.
15.6 OPTIMIZATION
TOOLS
Presented By:
Md. Mostafizur Rahaman (Roll-1901024)
Basics of Optimization tools:
• The fundamental principle of optimization tools that these works is the
concept of a nested operator.
• The Optimize operator performs two tasks: it determines what values to set
for the selected parameters for each iteration, and when to stop the
iterations.
• To set parameter values Rapid Miner uses three algorithms:
Grid search
Greedy search
Evolutionary search
For function y =f(x)= x6+ x3- 7x2- 3x +1
The local minimum of y = 24.33 @ x = 21.3 is found at the very first iteration. This corresponds to the window [ -1.5,
0]. If the grid had not spanned the entire domain [ -1.5, 1.5], the optimizer would have reported the local minimum
as the best performance.
Greedy algorithms are by nature
typically biased toward coverage of
a large number of cases or a quick
payback in the objective function.
In this case, the performance of
the quadratic optimizer is
marginally worse than a grid
search requiring about 100 shots
to hit the global minimum
(compared to 90 for a grid), as
seen in the figure:
It is evident that it takes far fewer
steps to get to the global minimum
with a high degree of confidence—
about 18 iterations as opposed to
90 or 100. Key concepts to
understanding this algorithm are
mutation and cross-over, both of
which are possible to control using
the RapidMiner GUI.
15.7 INTEGRATION WITH
R
Presented By:
Md. Al Morsaline (Roll-1901020)
INTEGRATION WITH R
Integration with R in RapidMiner refers to the seamless collaboration between
RapidMiner, a data science platform, and the R programming language, a popular
tool for statistical computing and data analysis. This integration allows users to
leverage the strengths of both RapidMiner and R, combining their respective
capabilities to enhance the data analysis and modeling process.
INTEGRATION WITH R
Data Visualization:
Data visualization tools available within RapidMiner were introduced.
Understanding the descriptive nature of the data through visualization is a crucial step in the data science
process.
Data Transformation and Reshaping:
Tools for transforming and reshaping incoming data were introduced.
Changing data types and restructuring data into different tabular forms were covered to facilitate subsequent
analysis.
Extracting Insights:
The ultimate goal is to develop optimized and high-quality models that can extract valuable insights from the
data.
Any Questions?