Download as pdf or txt
Download as pdf or txt
You are on page 1of 57

Introduction to SAS-JMP

Measure Phase
Course Outline

 Starter Overview  Graphical Data Exploration


> Introducing JMP features > Exploring continuous single variable
> Starter Description > Exploring discrete single variable

 Data file Opening  Reporting and Presenting Results


> Accessing data and opening different > Modifying a journal
types of files in JMP > Using a journal for presentations
> Understanding modeling type > Saving and sharing results
> Introducing the Table panel, Columns
panel, and Rows panel

 Data Exploration and Manipulation


> Using the Columns and Rows menus
> Using Tables menu to reshape data
> Using the Tabulate option to create
summary tables
Learning Objective

 How to navigate the JMP interface


 How to manage data effectively in JMP
 How to explore data by using JMP software's extensive
graphical capabilities
 How to create and manage reports in JMP.
JMP Starter Overview
What is JMP?

 JMP is software for interactive statistical graphics. It uses an extraordinary


graphical interface to display and analyze data. Features are as follows :
> A spreadsheet for viewing, editing, entering, and manipulating data;
> A broad range of graphical and statistical methods for data analysis;
> An extensive design of experiments module;
> Options to select and display subsets of the data;
> Data management tools for sorting and combining tables;
> A calculator for each table column to compute values;
> A facility for grouping data and computing summary statistics;
> Special plots, charts, and communication capability for quality improvement
techniques;
> Tools for moving analysis results between applications and for printing;
> A scripting language for saving frequently used routines.
JMP Starter: Overview
 When the JMP application opens, you see the JMP Starter, providing a good way to get
started if you haven’t used JMP before. It gives alternative access to most commands
found on the main menu or on toolbars.
 You can close the JMP Starter if you want; it is not required for running JMP. This section
gives an overview of the JMP Starter and briefly describes its tabbed pages and the items
or commands on them.
 To open and close the JMP Starter, select View > JMP Starter
JMP Starter: File Page
 The JMP Starter first appears with the File page showing. Most commands on the File page
correspond to File menu commands on the main menu bar.
 The commands on the File page open JMP data tables or other kinds of JMP windows, which is
often what you need to do first.
JMP Starter: The Basic Page

 The Basic page addresses univariate


and bivariate analyses.
 You see how to examine variables one
at a time by looking at distributions
and comparing them to known
distributions.
 When there are two variables, a single
response (y) and a single factor (x),
JMP performs the appropriate
bivariate analysis according to whether
the variables are continuous or
categorical.
 These analyses can be run by clicking
their buttons on the Basic page or by
selecting the Analyze Menu
JMP Starter: The Model Page
 The Model page gives choices for fitting all types of models—from simple regression and
analysis of variance to complex nonlinear fits.
JMP Starter: The Multivariate Page
 The Multivariate page introduces ways to look at continuous variables when they are
considered as responses only; there are no factor or independent variables. Multivariate
exploration with correlations and cluster analysis lets you look at many variables at the same
time.
JMP Starter: The Survival Page
 Survival data contain duration times until the occurrence of a specific event and are
sometimes referred to as event-time response data. The event can be failure, such as the
failure of an engine or death of a patient.
JMP Starter: The Graph Page
 The Graph page corresponds with commands in the main menu that produce plots and charts
of summarized data, a three-dimensional spinning plot, contour, and ternary plots.
JMP Starter: The Surface Page
 The Surface page corresponds with commands in the main menu that produce multi-dimensional
graphs, such as profilers and surface plots.
JMP Starter: The QC Page
 The QC page accesses the commands on the Graph menu that are used in statistical quality
control, except for the Capability button, which accesses the Distribution command found
on the Analyze menu.
JMP Starter: The DOE Page
 The DOE page corresponds to the commands in the DOE main menu. These commands
construct classical and custom experimental designs and save them in a JMP table. Selecting a
design type presents an environment for describing the factors, responses and other
specifications needed to make a design of that type.
JMP Starter: The Tables Page
 The Tables page corresponds to the commands in the Tables main menu.
Data File Import
Importing Data from Other Applications

 There are four ways to import data into JMP from other
applications:
> Copy and paste from Excel.
> Open a file created by another software program that JMP recognizes.
> Text import.
> Connect to a database.
File Menu > Open Data Table: Opening Existing JMP Files
 If you want to import a file that is a JMP data table (.jmp), script (.jsl), or journal (.jrn):

Step
Step 11 Step 3
Step 4
SelectFile
Select File>>Open.
Open Highlight the name of the
file you would like to open Click Open

Step 2
Select file type
See Slide 35 for File List
File Menu > Open Data Table: File Types
 See Step 2 on previous slide for selecting the file type by operating system for:

 Opening a JMP data table

 Opening a JMP script

 Opening a JMP Journal


Importing Data from Other Applications

 Copy and paste from Excel.


> Import the data from Excel into JMP using copy and paste.
Importing Data from Other Applications

 Open a file created by another software program that JMP


recognizes.
> JMP recognizes a variety of file types, including Excel (.xls), SAS (.sd2,
sd5, sas7bdat), and MS Access (.mbd).
Importing Data from Other Applications

 Text Import
> Import the text file in JMP
Specifying Data Types and Modeling Types

 The data type of a column determines


how its values are formatted in the data
grid, how they are stored internally,
and whether they can be used in
calculations. The three data types are:
> Numeric Columns only contain numbers,
with or without a decimal point.
> Character Columns contain any
characters, including numbers. In
character columns, numbers are seen as
characters only and are treated as
discrete values instead of continuous
values.
> Row State Columns contain row state
information—information that tells you
if the rows are excluded, hidden,
labeled, colored, or marked.
Data Exploration and Manipulation
Element of a JMP Data Table

Table panel
The table panel contains
the data table name, a
small red triangle icon,
and a list of any table
properties/scripts

Columns panel
The columns panel
contains a list of columns
found in the data table,
each column’s modeling
type, and any attributes
assigned to the columns.

Rows panel
The rows panel shows
the number of total rows,
selected (or highlighted)
rows, excluded rows,
hidden rows, and labelled
rows.
Selecting Row and Columns
Selected column
Selected column

Selected rows
Selecting Excluded, Hidden, or Labeled Rows

 Sometimes you need to automatically highlight, or select, certain types of rows so


you can see or manipulate them among the many rows of a data table. To select
rows that have been marked as excluded, hidden, or labeled:
Selecting Cells with Specific Values
If you are looking for a specific value in a data table, there are several ways to quickly select it: (1) Using Rows
> Row Selection > Select Matching Cells

Highlight the cells that contain


the value(s) you want to locate.

To find all matching cells within


the active data table, select
Rows > Row Selection > Select
Matching Cells.
Selecting Cells with Specific Values
If you are looking for a specific value in a data table, there are several ways to quickly select it: (1) Using Rows
> Row Selection > Select Where

If you currently have rows selected in the data table, click an option under
Currently Selected Rows to tell JMP how to handle that current selection:

• Clear Current Selection Removes the highlight from currently-


selected rows and selects all rows that contain the specified value.
• Extend Current Selection Keeps the currently-selected rows selected
and also selects the rows in which the specified value has been found.
• Select From Current Selection Selects the rows in the currently-
selected array that contain the specified values.
Subset, Concatenate, Join, and More

 You can perform a wide variety of data management tasks on JMP data:
> Create a new data table from a subset of rows and columns from another data
table
> Sort by any number of columns
> Stack multiple columns into a single column
> Split a column into two columns
> Transpose rows and columns
> Concatenate multiple tables end to end
> Join two tables side by side
> Update columns in a table with values from another table
Creating a Subset Table
You can produce a new data table that is a subset of all rows and columns, only highlighted rows
and columns, or randomly-selected rows from the active data table.
To create a subset: Select Tables > Subset.
Sorting Data Tables
You can sort a JMP data table by columns in either ascending or descending order. By default,
columns sort in ascending order. You can either create a new table that contains the sorted
values, or you can replace the original table with the sorted table.
To sort a data table: Select Tables > Sort.

Ascending
Descending
Stacking Columns
You can rearrange your data table by stacking two or more columns into a single new column,
preserving the values from the other columns. Or, you can stack a set of columns into multiple
groups.
To stack column/s: Select Tables > Stack.
Splitting Columns
You can create a new data table from the active table by dividing one column into several new
columns. You can divide one column into several columns according to the values of one or more
variables.
To split column: Select Tables > Split.

Split Label Col

Split Columns
Transposing Rows and Columns
You can create a new JMP table that is a transposed version of the active data table. The columns
of the active table are the rows of the new table, and its rows are the new table’s columns.
To transpose rows and columns : Select Tables > transpose.
Attaching Tables (Concatenating)
When you concatenate tables in JMP, you append them so one column in the new table is
created for each column name in the original tables.

Data Table : Trial 1 Data Table : Trial 2

Data Table : Trial 1

Data Table : Trial 2


Joining Tables
You can combine two data tables into one new table by selecting Tables > Join.
Tables can be joined in three different ways:
1. By combining them according to row number
2. In a Cartesian fashion, where you form a new table consisting of all the pairs from two
original tables
3. By matching the values in one or more columns that exist in both data tables
Data Table : Trial 1 Data Table : Trial 2
Joining Tables Cont’d
1. By combining them according to row number

+ =

2. In a Cartesian fashion, where you form a new table consisting of all the pairs from two original tables

+ =

3. By matching the values in one or more columns that exist in both data tables

+ =
Updating a Table

If you have two data tables and would like to update one table with data from a second table,
select Tables > Update.

Data Table : Big Class Data Table : New Heights


The Summarize and Tabulate Commands

 You can perform a wide variety of data management tasks on


JMP data:
> Create a table that contains columns of summary statistics
> Tabulate data so it is displayed in a tabular format
Summarizing Columns
The Tables > Summary command creates a summary table, which is a table that contains columns
of summary statistics from another data table, called the source table.

Group Variables

List of Summary
Statistics.
Graphical Data Exploration
Graph Types for Single Variable

Continuous Data Discrete Data


Histogram Bar Graph

Box Plot Pareto Diagram

Normal Probability Plot


Histogram

 Definition
> A Histogram is a vertical bar chart that
displays the distribution of a set of data. height
> Use to examine the shape and spread of
sample data.

 Purpose
> Graphically displays the distribution of a
set of data.
 The location (central tendency)
 The dispersion (variability)
 The shape
> Shows anomaly in the data set if 50 55 60 65 70
present.
 Extreme value
 Double peaks (mixture)
 Missing data (gap, truncation)
Probability Plots

 Definition
> A probability plot is constructed in a
way that the points will fall in a straight
line if they fit the distribution.
> This is a useful technique since the
human eye is better at assessing if
something is straight or not.
> The vertical axis represents estimated
cumulative probability.

 Purpose
> Determine if a data set fits the Normal
distribution. Unlike Histogram,
probability plots can provide a more
decisive approach.
Box Plot

 Definition
> Five-number summary
of a continuous variable: Minimum, Q1, Median, 200 maximum
Q3, Maximum.
 Length of the box is IQR.
Q3
 The line within the box marks the median.
150
 Two lines extend to outermost data value within 1.5 * Means Diamond
IQR of either Q1 or Q3.
 Potential outliers are shown as points. Median(Q2)
 Purpose 100
Q1
> Box plots are also very useful when large numbers of
observations are involved and when two or more Outermost data
data sets are being compared. value within
50 1.5xIQR of the
> They are helpful for indicating whether a 25th percentile
distribution is skewed and whether there are any
unusual observations (outliers) in the data set.
outlier
0
Histogram, NPP and Boxplot in JMP

 Choose Analyze > Distribution

 Select height > Y Columns

 Click Ok
Histogram – Analyzing the Result

Distributions Interpreting Histograms:


height The golden rule when analyzing
Quantiles Moments histogram is not to read too
100.0% maximum 70 Mean 62.55 much into, the results should be
99.5% 70 Std Dev 4.2423385 summarized using day to day
97.5% 69.975 Std Err Mean 0.6707726 language
90.0% 68 Upper 95% Mean 63.906766
75.0% quartile 65 Lower 95% Mean 61.193234
50.0% median 63 N 40 For example;
25.0% quartile 60.25
50 55 60 65 70 10.0% 56.2 “this histogram shows that the
2.5% 51.025 height range from about 51-70
0.5% 51 with most of the readings are
0.0% minimum 51 between 60-65
Or
“the distribution looks
symmetrical around the average
height of 62.55 and appears to
fit the Normal distribution
curve”
Probability Plot – Analyzing the Result

Normal Quantile Plot


A Normal distribution will form a
straight line that falls between .99
the 95% CI limits shown.
2
.95
.90
1
.75

.50 0
.25
-1 actual
The lower axis.10is the
values (same .05
units as the data)
-2
.01

-3

So, how straight should the line be?


Just as histograms are never perfectly smooth, the line will never
be perfectly straight even if the data is Normal. SAS-JMP places
95% CI limits on the diagram, and if all the points fall within the
lines, you can assume the data is Normally distributed.
Probability Plot – Analyzing the Result Cont’d

Histogram looks Histogram shows


possibly Normal, that the process is
but maybe a little significantly
skewed to the skewed to the
right? … right..

.. but the
probability
indicates it can be
assumed to be
Normal.

.. and the probability


plot confirms this.
Histogram – Analyzing the Result Cont’d

Distributions
Reactor
3

Normal Quantile Plot


Quantiles Moments
.99
100.0% maximum 98.000 Mean 65.5
Mean, Standard
2 deviation and Sample
.95 99.5% 98.000 Std Dev 14.962318
The Normal Probability .90 97.5% 98.000 Std Err Mean 2.6449892 Size are summarized
1
Plot .75 90.0% 93.700 upper 95% Mean 70.894491 here.
75.0% quartile 75.250 lower 95% Mean 60.105509
.50 0
50.0% median 62.000 N 32
.25 25.0% quartile 55.250 Sum Wgt 32
-1 The information on the
.10 10.0% 46.200 Sum 2096
.05 2.5% 42.000 Variance 223.87097
quartiles is used to
The Box plot .01
-2 0.5% 42.000 Skewness 0.7253727 generate the Box plot.
0.0% minimum 42.000 Kurtosis -0.073547
summarizes the -3 CV 22.843234
distribution of the data N Missing 0
and uses the same scale
as the histogram.
Fitted Normal The limits of the
Parameter Estimates Confidence Intervals for
Type Parameter Estimate Lower 95% Upper 95% the mean and standard
The histogram shows a Location Mu 65.50000 60.10551 70.89449 deviation are
data ranging about 42- Dispersion Sigma 14.96232 11.99534 19.89210 summarized here.
90 with most of the Goodness-of-Fit Test
40 50 60 70 80 90
weights being between Shapiro-Wilk W Test

55-65 and the data does W Prob<W The Shapiro Wilk W


appear to fit the
Normal(65.5,14.9623) 0.930088 0.0394 Test is to assessed if
Normal curve. the distribution is
normal.
Histograms vs. Boxplots vs. NPP

 The skewed shape is most easily  The symmetrical shape and


seen in the histogram. outliers can easily be seen in all
the graphs.

3 3

Normal Quantile Plot


Normal Quantile Plot
.99 .99
2 2
.95 .95
.90 .90
1 1
.75 .75

.50 0 .50 0
.25
.25
-1 -1
.10
.10
.05
.05
-2
-2 .01
.01

0 1 2 3 4 5 6
140 145 150 155 160
Pareto Chart

 Definition
> Very similar to Histogram
Plots
> Bar chart for Defect data. 250
100
> Use of percentages to show importance
200
80
> Largest count to the smallest.
150
 X-axis is the defect type. 60

 Y-axis is the counts. 100 40

 Solid line is cumulative percentage.


50 20
> Use of the 80/20 rule
0 0
 Separates the significant ‘vital few’ factors.

 Purpose
> Apply this tool to separate the vital few from the Reason for Postage Delay

trivial many.
> Hence, Pareto diagram is an effective tool for
problem identification.
Pareto Analysis in JMP

Data Preparation: Graph > Pareto Plot


JMP requires the data in
columns; one for defect Enter the Failure Type in Y, Cause
type, one for the
frequency, as shown as Frequency in Freq
below.
Pareto – Analyzing the Result
Pareto Plot
Freq: N

40
125

The Cumulative Frequency


100
is plotted and shows the 30

total number of failures

Cum Percent
75
encountered in the process

Count
20
from the left of the chart.
50

10
25

0 0

metallization
contamination

doping
miscellaneous

silicon defect
oxide defect

corrosion
The 8020 Principle:
failure
Reasons for failure are often found to
conform to the 80/20 principle which
says that 80% of the failures are
generally caused by around 20% of the
problems.
Takeaways

 Identify correct data types needed for a certain data analysis


 Understand the need for data collection and summarize data
sets into meaningful information
 Align basic statistical knowledge learned with the application
of SAS-JMP software.
 Use SAS-JMP as a standard software for statistical analysis.

You might also like