Stock Market Analysis and Prediction

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 26

STOCK MARKET ANALYSIS And

PREDICTION
By:
Vivek Bhalgat
Vivek Bijlwan
(under Dr. Ratna Sanyal)
SATYAM happened…
Rs. 430

Rs. 117
Rs. 6.30

In a span of 9 months , one could have made his money 18 times!!


OR
One could have cashed in at 430 , when others would sell at Rs. 6.30
Why Warren Buffett is the richest man
on Earth?
• In his own words
“The basic ideas of investing are to look at
stocks as business, use the market's
fluctuations to your advantage…..”

•So , what is a fluctuation ?


•How to identify it?
But How?
VALUE

INTRINSIC EXTRINSIC

Intrinsic value, or sometimes known as "Fundamental Value", is the value that remains in
an option when all of its extrinsic value has diminished due to Time Decay. It is the actual
value of a stock that has been built into the price of the option.
Ways to separate the values

• Independent Component Analysis(ICA)

• Wavelet transforms
ICA
Blind Signal Separation (BSS) or Independent Component Analysis (ICA) is the
identification & separation of mixtures of sources with little prior
information.
• Applications include:

– Audio Processing
– Medical data
– Finance
– Array processing (beamforming)
– Coding
• … and most applications where Factor Analysis and PCA is currently used.
• While PCA seeks directions that represents data best in a Σ|x0 - x|2 sense,
ICA seeks such directions that are most independent from each other.
We will concentrate on Time Series separation of Multiple Targets
The simple “Cocktail Party” Problem

Mixing matrix A

s1 x1

Observations
Sources
x2
s2
x = As

n sources, m=n observations


Motivation

Two Independent Sources Mixture at two Mics

x1 (t )  a11 s1  a12 s2
x2 (t )  a21s1  a22 s2
aIJ ... Depend on the distances of the microphones from the speakers
Motivation

Get the Independent Signals out of the Mixture


ICA Model (Noise Free)
• Use statistical “latent variables“ system(IID)
• Random variable sk instead of time signal
• xj = aj1s1 + aj2s2 + .. + ajnsn, for all j
x = As
• IC‘s s are latent variables & are unknown AND Mixing matrix A is
also unknown
• Task: estimate A and s using only the observeable random vector x
• Lets assume that no. of IC‘s = no of observable mixtures
and A is square and invertible
• So after estimating A, we can compute W=A-1 and hence
s = Wx = A-1x
Illustration
2 IC‘s with distribution:

 1 if | si | 3 
p ( si )   2 3 
 0 otherwise 
Zero mean and variance equal to 1

Mixing matrix A is

 2 3
A   
 2 1
The edges of the parallelogram are in the
direction of the cols of A
So if we can Est joint pdf of x1 & x2 and then
locating the edges, we can Est A.
Restrictions
• si are statistically independent
– p(s1,s2) = p(s1)p(s2)
• Nongaussian distributions
– The joint density of unit
variance s1 & s2 is symmetric.
So it doesn‘t contain any
information about the
directions of the cols of the
mixing matrix A. So A cann‘t
be estimated.
– If only one IC is gaussian, the 1  x12  x22 
estimation is still possible.
p ( x1 , x2 )  exp  
2  2 
Ambiguities
• Can‘t determine the variances (energies)
of the IC‘s
– Both s & A are unknowns, any scalar multiple in one of the
sources can always be cancelled by dividing the corresponding
col of A by it.
– Fix magnitudes of IC‘s assuming unit variance: E{si2} = 1
– Only ambiguity of sign remains

• Can‘t determine the order of the IC‘s


– Terms can be freely changed, because both s and A are
unknown. So we can call any IC as the first one.
ICA Principal (Non-Gaussian is Independent)
• Key to estimating A is non-gaussianity
• The distribution of a sum of independent random variables tends toward a Gaussian
distribution. (By CLT)

f(s1) f(s2) f(x1) = f(s1 +s2)


• Where w is one of the rows of matrix W.

y  w x  w As  z s
T T T

• y is a linear combination of si, with weights given by zi.


• Since sum of two indep r.v. is more gaussian than individual r.v., so z Ts is more
gaussian than either of si. AND becomes least gaussian when its equal to one of s i.
• So we could take w as a vector which maximizes the non-gaussianity of wTx.
• Such a w would correspond to a z with only one non zero comp. So we get back the si.
Measures of Non-Gaussianity
• We need to have a quantitative measure of non-gaussianity for ICA
Estimation.
• Kurtotis : gauss=0 (sensitive to outliers)
kurt ( y )  E{ y 4 }  3( E{ y 2 }) 2
• Entropy : gauss=largest
H ( y )    f ( y ) log f ( y )dy
• Neg-entropy : gauss = 0 (difficult to estimate)
J ( y )  H ( y gauss )  H ( y )
• Approximations
  2
J ( y )  1 E y 2  1 kurt ( y ) 2
12 48
J ( y )   E G ( y )  E G (v) 
2

• where v is a standard gaussian random variable and :


G ( y )  1 log cosh(a. y )
a
G ( y )   exp( a.u 2 / 2)
Data Centering & Whitening
• Centering
x = x‘ – E{x‘}
– But this doesn‘t mean that ICA cannt estimate the mean, but it just simplifies the
Alg.
– IC‘s are also zero mean because of:
E{s} = WE{x}
– After ICA, add W.E{x‘} to zero mean IC‘s
• Whitening
– We transform the x’s linearly so that the x~ are white. Its done by EVD.
x~ = (ED-1/2ET)x = ED-1/2ET Ax = A~s
where E{xx~} = EDET
So we have to Estimate Orthonormal Matrix A~
– An orthonormal matrix has n(n-1)/2 degrees of freedom. So for large dim A we
have to est only half as much parameters. This greatly simplifies ICA.
• Reducing dim of data (choosing dominant Eig) while doing whitening also
help.
RESULTS
• Data taken

TCS at BSE for the past 400 days.


Our Data sources : BSE and NSE

BSE

NSE

TCS
Intrinsic & Extrinsic

TCS
Other companies at BSE and NSE

Wipro
HCL

Infosys
Their ICs

HCL Infosys

Wipro
Correlation(TCS , Infosys)
And the others…

Correlation(TCS,Wipro) Correlation(Infosys,Wipro)
With other sectors

Correlation(TCS, JK Cement) Correlation(TCS, Reliance)


Work after Mid-Sem
•Wavelet Transform :
Why Wavelet Transform?

Why not Fourier ? Time invariant


Why not Short term Fourier transform ? Heisenberg’s Uncertainty
Principle
Wavelet Transform : Multi Resolution Signal Analysis

Unlike the STFT which has a constant resolution at all times and
frequencies, the WT has a good time and poor frequency resolution at
high frequencies, and good frequency and poor time resolution at low
frequencies

•Analysis and Evaluation of Results


References:
• http://users.rowan.edu/~polikar/WAVELETS/WTtutorial.html
• http://www.cis.hut.fi/aapo/papers/IJCNN99_tutorialweb/
• http://en.wikipedia.org/wiki/Independent_component_analy
sis
• Pierre Comon (1994): Independent Component Analysis: a
new concept?, Signal Processing, Elsevier, 36(3):287--314 (The
original paper describing the concept of ICA)

You might also like