Professional Documents
Culture Documents
Statistics Helps Numerical Evidence Problem
Statistics Helps Numerical Evidence Problem
Statistics method
2. Pattern recognition
Histogram, scatter plot
Eg. Scatter tells relationship between variable
Box plot- finding outliers
3. Association analysis- finding related items
4. Predictive Models
Regression= y=a+bx-number
a) Logistic Regression- Why
b) Neural Networks- Multiple output
Types
1.Descriptive
2. Inferential Statistics
Talk about population from sample
If I know the patterns in the population I can know the pattern for the future data
But how to find pattern in population through sampling and sampling paramater
Data sources
Type of data
Continuous-
Data objects
Row->data objects
Colums->attributes
Numeric- quantitative
Interval- split in to equal size range - no true aero poit
Ratio- true zero point
Data->information->knowledge->wisdom
Data->information( Descriptive)
X axis-> data
Y axis= how much data
Data spitted in to bin( width of the bar)
Cumulative distribution function: how many observation are <= some variables
Cons:efftected by outliers
Median:
Sort the data in ascending order and take the middle most which is 50%
Odd number -> Median= n+1/2
Cons= Even number-> Median=average of two numbers
Pros= not effected by outliers
Mode:
Maximum frequency
Cons= same number with more freq.
Pros= not effected by outliers
MEASURE OF DISPERSION
Once you know the central tendancy the next step is to find the data speread distributed
around CT
IQRS
Inter QUARTILE Range -> Remove top and bottom 25% consider only middle 50 %
IQR=Q3-Q1
Standard Deviation
Co efficient of variation
CV=S\V gives proportion of variations
Chebyshiv Rule
68 % -> +1 1 SD
95%-> +,- 2 SD
99.7% 3SD
Xmin
Q1
Q2(Median)
Q3
Xmax
Distribution
Left skewed
Symmetry
Right Skewed
1.5*iqrs