Stock Market Analysis and Prediction

STOCK MARKET ANALYSIS And
PREDICTION
By:
Vivek Bhalgat
Vivek Bijlwan
(under Dr. Ratna Sanyal)
SATYAM happened…
Rs. 430
Rs. 117
Rs. 6.30
In a span of 9 months , one could have made his money 18 times!!

OR
One could have cashed in at 430 , when others would sell at Rs. 6.30
Why Warren Buffett is the richest man
on Earth?
• In his own words
“The basic ideas of investing are to look at
stocks as business, use the market's
fluctuations to your advantage…..”
•So , what is a fluctuation ?

•How to identify it?
But How?
VALUE
INTRINSIC EXTRINSIC
Intrinsic value, or sometimes known as "Fundamental Value", is the value that remains in
an option when all of its extrinsic value has diminished due to Time Decay. It is the actual
value of a stock that has been built into the price of the option.
Ways to separate the values
• Independent Component Analysis(ICA)
• Wavelet transforms
ICA
Blind Signal Separation (BSS) or Independent Component Analysis (ICA) is the
identification & separation of mixtures of sources with little prior
information.
• Applications include:
– Audio Processing
– Medical data
– Finance
– Array processing (beamforming)
– Coding
• … and most applications where Factor Analysis and PCA is currently used.
• While PCA seeks directions that represents data best in a Σ|x0 - x|2 sense,
ICA seeks such directions that are most independent from each other.
We will concentrate on Time Series separation of Multiple Targets
The simple “Cocktail Party” Problem
Mixing matrix A
s1 x1
Observations
Sources
x2
s2
x = As
n sources, m=n observations

Motivation
Two Independent Sources Mixture at two Mics
x1 (t )  a11 s1  a12 s2
x2 (t )  a21s1  a22 s2
aIJ ... Depend on the distances of the microphones from the speakers
Motivation
Get the Independent Signals out of the Mixture

ICA Model (Noise Free)
• Use statistical “latent variables“ system(IID)
• Random variable sk instead of time signal
• xj = aj1s1 + aj2s2 + .. + ajnsn, for all j
x = As
• IC‘s s are latent variables & are unknown AND Mixing matrix A is
also unknown
• Task: estimate A and s using only the observeable random vector x
• Lets assume that no. of IC‘s = no of observable mixtures
and A is square and invertible
• So after estimating A, we can compute W=A-1 and hence
s = Wx = A-1x
Illustration
2 IC‘s with distribution:
 1 if | si | 3 
p ( si )   2 3 
 0 otherwise 
Zero mean and variance equal to 1
Mixing matrix A is
 2 3
A   
 2 1
The edges of the parallelogram are in the
direction of the cols of A
So if we can Est joint pdf of x1 & x2 and then
locating the edges, we can Est A.
Restrictions
• si are statistically independent
– p(s1,s2) = p(s1)p(s2)
• Nongaussian distributions
– The joint density of unit
variance s1 & s2 is symmetric.
So it doesn‘t contain any
information about the
directions of the cols of the
mixing matrix A. So A cann‘t
be estimated.
– If only one IC is gaussian, the 1  x12  x22 
estimation is still possible.
p ( x1 , x2 )  exp  
2  2 
Ambiguities
• Can‘t determine the variances (energies)
of the IC‘s
– Both s & A are unknowns, any scalar multiple in one of the
sources can always be cancelled by dividing the corresponding
col of A by it.
– Fix magnitudes of IC‘s assuming unit variance: E{si2} = 1
– Only ambiguity of sign remains
• Can‘t determine the order of the IC‘s

– Terms can be freely changed, because both s and A are
unknown. So we can call any IC as the first one.
ICA Principal (Non-Gaussian is Independent)
• Key to estimating A is non-gaussianity
• The distribution of a sum of independent random variables tends toward a Gaussian
distribution. (By CLT)
f(s1) f(s2) f(x1) = f(s1 +s2)

• Where w is one of the rows of matrix W.
y  w x  w As  z s
T T T
• y is a linear combination of si, with weights given by zi.

• Since sum of two indep r.v. is more gaussian than individual r.v., so z Ts is more
gaussian than either of si. AND becomes least gaussian when its equal to one of s i.
• So we could take w as a vector which maximizes the non-gaussianity of wTx.
• Such a w would correspond to a z with only one non zero comp. So we get back the si.
Measures of Non-Gaussianity
• We need to have a quantitative measure of non-gaussianity for ICA
Estimation.
• Kurtotis : gauss=0 (sensitive to outliers)
kurt ( y )  E{ y 4 }  3( E{ y 2 }) 2
• Entropy : gauss=largest
H ( y )    f ( y ) log f ( y )dy
• Neg-entropy : gauss = 0 (difficult to estimate)
J ( y )  H ( y gauss )  H ( y )
• Approximations
  2
J ( y )  1 E y 2  1 kurt ( y ) 2
12 48
J ( y )   E G ( y )  E G (v) 
2
• where v is a standard gaussian random variable and :

G ( y )  1 log cosh(a. y )
a
G ( y )   exp( a.u 2 / 2)
Data Centering & Whitening
• Centering
x = x‘ – E{x‘}
– But this doesn‘t mean that ICA cannt estimate the mean, but it just simplifies the
Alg.
– IC‘s are also zero mean because of:
E{s} = WE{x}
– After ICA, add W.E{x‘} to zero mean IC‘s
• Whitening
– We transform the x’s linearly so that the x~ are white. Its done by EVD.
x~ = (ED-1/2ET)x = ED-1/2ET Ax = A~s
where E{xx~} = EDET
So we have to Estimate Orthonormal Matrix A~
– An orthonormal matrix has n(n-1)/2 degrees of freedom. So for large dim A we
have to est only half as much parameters. This greatly simplifies ICA.
• Reducing dim of data (choosing dominant Eig) while doing whitening also
help.
RESULTS
• Data taken
TCS at BSE for the past 400 days.

Our Data sources : BSE and NSE
BSE
NSE
TCS
Intrinsic & Extrinsic
TCS
Other companies at BSE and NSE
Wipro
HCL
Infosys
Their ICs
HCL Infosys
Wipro
Correlation(TCS , Infosys)
And the others…
Correlation(TCS,Wipro) Correlation(Infosys,Wipro)
With other sectors
Correlation(TCS, JK Cement) Correlation(TCS, Reliance)

Work after Mid-Sem
•Wavelet Transform :
Why Wavelet Transform?
Why not Fourier ? Time invariant

Why not Short term Fourier transform ? Heisenberg’s Uncertainty
Principle
Wavelet Transform : Multi Resolution Signal Analysis
Unlike the STFT which has a constant resolution at all times and
frequencies, the WT has a good time and poor frequency resolution at
high frequencies, and good frequency and poor time resolution at low
frequencies
•Analysis and Evaluation of Results

References:
• http://users.rowan.edu/~polikar/WAVELETS/WTtutorial.html
• http://www.cis.hut.fi/aapo/papers/IJCNN99_tutorialweb/
• http://en.wikipedia.org/wiki/Independent_component_analy
sis
• Pierre Comon (1994): Independent Component Analysis: a
new concept?, Signal Processing, Elsevier, 36(3):287--314 (The
original paper describing the concept of ICA)

Stock Market Analysis and Prediction

Uploaded by

Copyright:

Available Formats

You might also like

Stock Market Analysis and Prediction

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stock Market Analysis and Prediction

Uploaded by

Copyright:

Available Formats

STOCK MARKET ANALYSIS And

In a span of 9 months , one could have made his money 18 times!!

•So , what is a fluctuation ?

• Independent Component Analysis(ICA)

n sources, m=n observations

Two Independent Sources Mixture at two Mics

Get the Independent Signals out of the Mixture

• Can‘t determine the order of the IC‘s

f(s1) f(s2) f(x1) = f(s1 +s2)

• y is a linear combination of si, with weights given by zi.

• where v is a standard gaussian random variable and :

TCS at BSE for the past 400 days.

Correlation(TCS, JK Cement) Correlation(TCS, Reliance)

Why not Fourier ? Time invariant

•Analysis and Evaluation of Results

You might also like