Professional Documents
Culture Documents
Lab Stocks Clustering Jerarquico Daniel Ames Camayo
Lab Stocks Clustering Jerarquico Daniel Ames Camayo
June 6, 2024
1 GUIA DE LABORATORIO
1.1 APELLIDOS, Nombres: AMES CAMAYO, DANIEL VIDES
Fecha: 06 de Octubre del 2023
2 STANDARD LIBRARIES:
[103]: from matplotlib import style
style.use("ggplot")
3 CUSTOMIZED LIBRARIES:
[104]: import seaborn as sns
from scipy.stats import randint as sp_randint
from sklearn.decomposition import PCA
4 EXTRACCION DE DATOS:
[105]: df = pd.read_csv("https://raw.githubusercontent.com/tatsath/fin-ml/master/
↪Chapter%208%20-%20Unsup.%20Learning%20-%20Clustering/Data_MasterTemplate.
df
1
… … … … … …
2019-01-31 200.300000 102.700000 166.440000 385.620000 133.160000
2019-02-01 199.160000 103.060000 166.520000 387.430000 130.910000
2019-02-04 200.210000 103.420000 171.250000 397.000000 130.880000
2019-02-05 201.120000 103.900000 174.180000 410.180000 132.000000
2019-02-06 202.570000 104.960000 174.240000 411.110000 130.540000
2
2019-02-06 269.500000 53.790000 141.49 95.640000 71.470000
VZ WMT WBA
Date
2000-01-03 22.564221 47.337599 21.713237
2000-01-04 21.833915 45.566248 20.907354
[2 rows x 28 columns]
3
2000-01-03 43.003876 16.983583 23.52222 23.862240 … 38.135101
2000-01-04 40.577200 17.040950 24.89986 23.405167 … 36.846046
VZ WMT WBA
Date
2000-01-03 22.564221 47.337599 21.713237
2000-01-04 21.833915 45.566248 20.907354
[2 rows x 28 columns]
# Rendimiento diario
# Rendimiento diario (%)
datareturns = np.log(data / data.shift(1))
[109]: datareturns.head()
4
WMT WBA
Date
2000-01-03 NaN NaN
2000-01-04 -0.038138 -0.037821
2000-01-05 -0.023601 0.009050
2000-01-06 0.013913 -0.027399
2000-01-07 0.072806 0.025234
[5 rows x 28 columns]
[112]: datareturns.head(3)
WMT WBA
Date
2000-01-03 NaN NaN
2000-01-04 -0.037420 -0.037115
2000-01-05 -0.023325 0.009091
[3 rows x 28 columns]
5
[113]: Date
2000-01-03 NaN
2000-01-04 -0.031058
2000-01-05 0.019281
Name: DJIA, dtype: float64
WBA DJIA
Date
2000-01-04 -0.037115 -0.031058
2000-01-05 0.009091 0.019281
2000-01-06 -0.027027 0.002579
[3 rows x 29 columns]
6
2000-01-06 3.995818 -0.021248 -3.400666 0.108313 1.770951 2.389941
WBA DJIA
Date
2000-01-04 -2.167895 -2.739489
2000-01-05 0.502376 1.631646
2000-01-06 -1.584911 0.181378
[3 rows x 29 columns]
scaler = StandardScaler()
scaled_df = scaler.fit_transform(df)
linked
7
[118]: array([[7.69000000e+02, 7.70000000e+02, 1.87207085e-02, 2.00000000e+00],
[1.50200000e+03, 1.50300000e+03, 2.57043425e-02, 2.00000000e+00],
[9.07000000e+02, 9.08000000e+02, 2.70617818e-02, 2.00000000e+00],
…,
[9.60100000e+03, 9.60200000e+03, 1.36129958e+02, 3.28300000e+03],
[9.59500000e+03, 9.60300000e+03, 1.63674978e+02, 1.52100000e+03],
[9.60400000e+03, 9.60500000e+03, 4.26367861e+02, 4.80400000e+03]])
8
[120]: from scipy.cluster.hierarchy import dendrogram, linkage
dendrogram
9
[121]: array([[ 5. , 9. , 0.27207065, 2. ],
[10. , 15. , 0.45595649, 2. ],
[ 1. , 30. , 0.53472591, 3. ],
[ 6. , 13. , 0.55217128, 2. ],
[ 0. , 23. , 0.59795924, 2. ],
[18. , 32. , 0.63802964, 3. ],
[ 4. , 33. , 0.6421695 , 3. ],
[17. , 20. , 0.65092769, 2. ],
[14. , 36. , 0.6956485 , 3. ],
[12. , 34. , 0.71936254, 4. ],
[ 3. , 35. , 0.73536588, 4. ],
[ 8. , 31. , 0.76549928, 4. ],
[ 7. , 21. , 0.78186826, 2. ],
[11. , 26. , 0.78217715, 2. ],
[22. , 40. , 0.79465521, 5. ],
[39. , 43. , 0.83270992, 9. ],
[19. , 42. , 0.91368815, 3. ],
[ 2. , 38. , 0.91841374, 5. ],
[37. , 41. , 0.93692083, 5. ],
[29. , 44. , 0.9489942 , 11. ],
[27. , 45. , 0.97363533, 4. ],
[25. , 47. , 0.98299793, 6. ],
[16. , 49. , 1.01064171, 5. ],
[24. , 50. , 1.01076057, 7. ],
[51. , 52. , 1.0574304 , 12. ],
[46. , 48. , 1.06366375, 16. ],
[53. , 54. , 1.13906782, 28. ],
[28. , 55. , 1.46627118, 29. ]])
[122]: 0.7996812457331596
10
Z,
leaf_rotation=90., # rotates the x axis labels
leaf_font_size=8., # font size for the x axis labels
labels = corr.columns
)
pylab.yticks(fontsize=ticksize)
pylab.xticks(rotation=-90, fontsize=ticksize)
plt.savefig('dendogram_'+'DJIA'+'.png')
plt.show()
#INTERPRETACIÓN DE RESULTADOS:
[124]: # Interpretación basada en el dendrograma y los clusters formados
# Se pueden identificar cuántos clusters son óptimos y etiquetar los datos en␣
↪consecuencia.
11
Date …
2000-01-03 43.003876 16.983583 23.522220 23.862240 … 4.701180
2000-01-04 40.577200 17.040950 24.899860 23.405167 … 4.445214
2000-01-05 40.895453 17.228147 25.781550 24.569179 … 4.702157
2000-01-06 39.781569 17.210031 24.899860 25.958680 … 4.677733
2000-01-07 42.128682 18.342270 24.506249 25.882501 … 4.677733
[5 rows x 29 columns]
12