Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 37

15: Getting Started with

RapidMiner (Part-02)
Presented by:
Shabit Mahmud(1901044)
Md. Mostafizur Rahaman(1901024)
MD. Al Morsaline(1901020)
D. M. Khalid Mahmud(1901004)
Outlines
• 15.5 SAMPLING AND MISSING VALUE TOOLS
• 15.6 OPTIMIZATION TOOLS
• 15.7 INTEGRATION WITH R
• Conclusion
15.5 SAMPLING AND
MISSING VALUE TOOLS
Presented By:
Shabit Mahmud (ID-1901044)
Sampling Tool in
Rapidminer
Drag and drop the dataset first.
Search the sampling option in the operator option
and drag and drop the option.
We can also change the sampling type also such as
relative, absolute etc.
If we select the sampling type relative and
sampling rate 0.6, 60% data will be retrieved then.
After running, the following interface will come.
Define the sampling rate and press the run button.
After running, we can see the result like this.
Missing Value Tool in
Rapidminer
Drag and drop the dataset first.
After running, from the statistics option we can see
the number of missing values for a column.
Now select the attribute option and drag and drop it.
If we run it, we can see the values of the selected
attribute.
Search replace missing values in the operator
option.
Drag and drop the replace missing value option and select
the option by which we want to replace the missing values.
If we select average, the missing values of the selected
attribute will be replaced by the average value.
After running, we can see there is no missing
value for the selected attribute.
15.6 OPTIMIZATION
TOOLS
Presented By:
Md. Mostafizur Rahaman (Roll-1901024)
Basics of Optimization tools:
• The fundamental principle of optimization tools that these works is the
concept of a nested operator.
• The Optimize operator performs two tasks: it determines what values to set
for the selected parameters for each iteration, and when to stop the
iterations.
• To set parameter values Rapid Miner uses three algorithms:

Grid search
Greedy search
Evolutionary search
For function y =f(x)= x6+ x3- 7x2- 3x +1

For x in [ 2 1.5, 2], there are two minima:


a local minimum of y = 24.33, x = 21.3
and a global minimum of y = 27.96, x =
1.18. It will be demonstrated how to use
RapidMiner to search for these minima
using the Optimize operators.
Optimize Parameters (Grid) operator:
The interval [lower bound,
upper bound] is basically being
optimized so that the objective
of minimizing the function y =
f(x) can be achieved. As seen in
the function plot, the entire
domain of x has to be traversed
in small enough interval sizes so
that the exact point at which y
hits a global minimum can be
found.
Searching for an optimum within a fixed window that slides across:
Configuring the grid search Progression of the grid search
optimizer. optimization.

The local minimum of y = 24.33 @ x = 21.3 is found at the very first iteration. This corresponds to the window [ -1.5,
0]. If the grid had not spanned the entire domain [ -1.5, 1.5], the optimizer would have reported the local minimum
as the best performance.
Greedy algorithms are by nature
typically biased toward coverage of
a large number of cases or a quick
payback in the objective function.
In this case, the performance of
the quadratic optimizer is
marginally worse than a grid
search requiring about 100 shots
to hit the global minimum
(compared to 90 for a grid), as
seen in the figure:
It is evident that it takes far fewer
steps to get to the global minimum
with a high degree of confidence—
about 18 iterations as opposed to
90 or 100. Key concepts to
understanding this algorithm are
mutation and cross-over, both of
which are possible to control using
the RapidMiner GUI.
15.7 INTEGRATION WITH
R
Presented By:
Md. Al Morsaline (Roll-1901020)
INTEGRATION WITH R
Integration with R in RapidMiner refers to the seamless collaboration between
RapidMiner, a data science platform, and the R programming language, a popular
tool for statistical computing and data analysis. This integration allows users to
leverage the strengths of both RapidMiner and R, combining their respective
capabilities to enhance the data analysis and modeling process.
INTEGRATION WITH R

FIGURE : Integration with R.


With the integration of R in RapidMiner, We can:
 Utilize R Scripts: RapidMiner allows users to embed R scripts directly within its
workflow. This means that we are proficient in R can incorporate custom R code
to perform specialized analyses, visualizations, or data manipulations that
might not be readily available in RapidMiner's standard toolbox.

 Leverage R Libraries: R has a vast ecosystem of libraries for various statistical


and machine learning tasks. Integrating R with RapidMiner enables us to access
these libraries and tap into their functionality from within the RapidMiner
interface, enhancing the range of analyses that can be conducted.

 Customize Analysis: The integration enables us to create highly customized


analysis pipelines. They can combine the visual data preparation and modeling
capabilities of RapidMiner with the scripting power of R, resulting in tailored
and advanced analysis approaches.
With the integration of R in RapidMiner, We can:
Hybrid Workflows: We can seamlessly combine RapidMiner's native operators
with R scripts in a single workflow. This enables the construction of hybrid
workflows where each tool is used to its best advantage, creating more efficient
and powerful analyses.

Rich Visualization: R is renowned for its visualization capabilities. By integrating R,


users can generate complex and informative visualizations that can enrich their
data analysis presentations and insights.

Access to R Models: We can apply models developed in R directly within


RapidMiner, enhancing the deployment of R-based solutions through the user-
friendly interface of RapidMiner.
Conclusion
Presented By:
D. M. Khalid Mahmud (ID-1901004)
The chapter covered the main aspects, emphasizing the importance of data preparation, visualization, transformation,
and applying algorithms to achieve meaningful insights through the RapidMiner platform. The overall discussion on
building data science models using RapidMiner:

 Accessing RapidMiner Process and Data Files:


 The RapidMiner process developed in this chapter can be accessed from the companion site of the book at
www.IntroDataScience.com.
 RapidMiner process files (*.rmp) can be downloaded and imported into RapidMiner through the "File > Import
Process" option.
 Data files can be imported into RapidMiner using the "File > Import Data" option.

 High-Level Overview of RapidMiner Tools:


 The chapter provided an introduction to the basic graphical user interface of RapidMiner.
 Various methods for importing and exporting data from RapidMiner were discussed.

 Data Visualization:
 Data visualization tools available within RapidMiner were introduced.
 Understanding the descriptive nature of the data through visualization is a crucial step in the data science
process.
 Data Transformation and Reshaping:
 Tools for transforming and reshaping incoming data were introduced.
 Changing data types and restructuring data into different tabular forms were covered to facilitate subsequent
analysis.

 Resampling and Handling Missing Values:


 Tools for resampling available data and addressing missing values were discussed.
 Dealing with imbalanced datasets and handling missing information are important data preprocessing steps.

 Applying Algorithms and Optimization:


 Once familiar with data preparation, users can apply various algorithms for analysis.
 Optimization operators were introduced to fine-tune machine learning algorithms and develop high-quality
models.

 Extracting Insights:
 The ultimate goal is to develop optimized and high-quality models that can extract valuable insights from the
data.
Any Questions?

You might also like