Reinforcement Learning - Assignment 2

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

07/11/2023, 13:05 HW Assignment 2

Salman Yousaf salmany@uchicago.edu


In [1]: import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from IPython.display import display, HTML


import gym
from gym import spaces
import matplotlib.pyplot as plt

Question 1
In [2]: total_steps = 100
n = 8
Pmat = np.array([[90.81,8.33,0.68,0.06,0.08,0.02,0.01,0.01],
[0.7,90.65,7.79,0.64,0.06,0.13,0.02,0.01],
[0.09,2.27,91.05,5.52,0.74,0.26,0.01,0.06],
[0.02,0.33,5.95,85.93,5.3,1.17,1.12,0.18],
[0.03,0.14,0.67,7.73,80.53,8.84,1,1.06],
[0.01,0.11,0.24,0.43,6.48,83.46,4.07,5.2],
[0.21,0,0.22,1.3,2.38,11.24,64.86,19.79],
[0,0,0,0,0,0,0,100],
],dtype=float) / 100
P = np.zeros((total_steps,n,n),dtype=np.float64)
P[0] = Pmat
for t in range(1,total_steps):
P[t] = np.matmul(P[t-1],Pmat)

ratings = ['AAA', 'AA', 'A', 'BBB', 'BB', 'B', 'CCC', 'D']


for i, from_rating in enumerate(ratings):
plt.figure(figsize=(10,8))
for j, to_rating in enumerate(ratings):
plt.plot(P[:,i,j], label = f"{from_rating} -> {to_rating}")
plt.xlim(0,100)
plt.ylim(0,1.2)
plt.title(f'N-Step Transition Probabilities for {from_rating}')
plt.xlabel('Step n')
plt.ylabel('Probability')
plt.legend()

file:///Users/salmanyousaf/Downloads/HW Assignment 2 (2).html 1/10


07/11/2023, 13:05 HW Assignment 2

file:///Users/salmanyousaf/Downloads/HW Assignment 2 (2).html 2/10


07/11/2023, 13:05 HW Assignment 2

file:///Users/salmanyousaf/Downloads/HW Assignment 2 (2).html 3/10


07/11/2023, 13:05 HW Assignment 2

file:///Users/salmanyousaf/Downloads/HW Assignment 2 (2).html 4/10


07/11/2023, 13:05 HW Assignment 2

Question 2
The Markov chain has two classes.
file:///Users/salmanyousaf/Downloads/HW Assignment 2 (2).html 5/10
07/11/2023, 13:05 HW Assignment 2

The first class contains the states "AAA", "AA", "A", "BBB", "BB", "B", and "CCC".
These states can all transition to each other.
The second class contains the single state "D". Once the chain enters state "D", it
cannot leave.
In other words, the first class is a group of states that the chain can move between
freely. The second class is a single state that the chain can only enter once and then
cannot leave.
Question 3
If a Markov chain has an absorbing state (a state that, once entered, cannot be left),
the periodicity of the chain is 1, making the chain aperiodic.
Question 4
In [3]: np.random.seed(1234)
states = ['AAA']
transition_lh = 1.0
summary = pd.DataFrame(columns=['Step','State_i','State_j','Transition Pr
for t in range(total_steps):
next_state = np.random.choice(range(n), p = Pmat[ratings.index(states
states.append(ratings[next_state])
transition_lh = Pmat[ratings.index(states[-2]),ratings.index(states[-
summary = pd.concat([summary, pd.DataFrame({'Step':[t+1],'State_i':[s
'Transition Probability':

In [4]: plt.figure(figsize=(10,5))
plt.step(range(total_steps+1),states)
plt.title('Bond Transition Simulation with Initial AAA Rating')
plt.xlabel('Time (t)')
plt.ylabel('State')

Out[4]: Text(0, 0.5, 'State')

Likelihood of each transition, and entire simulated sequence:


file:///Users/salmanyousaf/Downloads/HW Assignment 2 (2).html 6/10
07/11/2023, 13:05 HW Assignment 2

In [5]: display(summary)

Step State_i State_j Transition Probability


0 1 AAA AAA 0.9081
1 2 AAA AAA 0.9081
2 3 AAA AAA 0.9081
3 4 AAA AAA 0.9081
4 5 AAA AAA 0.9081
... ... ... ... ...
95 96 BB BB 0.8053
96 97 BB BB 0.8053
97 98 BB BB 0.8053
98 99 BB BBB 0.0773
99 100 BBB BBB 0.8593
100 rows × 4 columns
In [6]: print(f"Sequence Likelihood: {np.product(summary['Transition Probability'

Sequence Likelihood: 7.571112140228321e-21

Question 5
In [8]: Q = Pmat[:-1,:-1]
I = np.identity(Q.shape[0])
N = np.linalg.inv(I - Q)

title = "Expected No. of Transitions Prior to Entering Absorbing State"


display(HTML(f'<h4>{title}</h4>'))
display(pd.DataFrame(N, index=ratings[:-1], columns=ratings[:-1]))

Expected No. of Transitions Prior to Entering Absorbing State


AAA AA A BBB BB B CCC
AAA 12.888602 20.423473 32.921814 19.773631 9.880736 8.693628 1.942995
AA 2.096882 21.566386 33.163148 19.873529 9.878730 8.700411 1.944556
A 1.273665 10.541247 34.796554 20.045532 9.945452 8.686041 1.944228
BBB 0.891828 7.125979 21.863100 21.915782 9.931234 8.610521 1.988953
BB 0.616925 4.709273 14.029035 13.026152 12.483263 9.095669 1.830750
B 0.373842 2.744633 7.972364 7.147212 6.179523 11.156707 1.699788
CCC 0.279353 1.648535 4.723658 4.222820 3.310801 4.609543 3.610820
Question 6
In [13]: # Define the submatrix H of transition probabilities between transient st
H = Pmat[0:(n-1),0:(n-1)]

file:///Users/salmanyousaf/Downloads/HW Assignment 2 (2).html 7/10


07/11/2023, 13:05 HW Assignment 2

# Compute the fundamental matrix Z


Z = np.linalg.inv(np.identity(H.shape[0])-H)

# Initialize an empty DataFrame to store the results


results = pd.DataFrame(index=ratings[0:-1], columns=ratings[0:-1])

# Loop over each state i


for i in range(n-1):
# Loop over each state j
for j in range(n-1):
if i == j:
# For the same state, use the original formula
Q_ij = (Z[i, j] - 1) / Z[j, j]
else:
# For different states, use the modified formula
Q_ij = Z[i, j] / Z[j, j]

# Store the results in the DataFrame


results.iloc[i, j] = Q_ij

# Print the results


print(results)

# Create a heatmap from the DataFrame


sns.heatmap(results.astype(float), annot=True, cmap='YlGnBu')

# Show the plot


plt.show()

AAA AA A BBB BB B CC
C
AAA 0.922412 0.947005 0.946123 0.902255 0.791519 0.779229 0.53810
4
AA 0.162693 0.953632 0.953058 0.906814 0.791358 0.779837 0.53853
6
A 0.098821 0.488781 0.971262 0.914662 0.796703 0.778549 0.53844
5
BBB 0.069195 0.330421 0.628312 0.954371 0.795564 0.77178 0.55083
1
BB 0.047866 0.218362 0.403173 0.594373 0.919893 0.815265 0.50701
8
B 0.029006 0.127264 0.229114 0.326122 0.495025 0.910368 0.47074
9
CCC 0.021674 0.07644 0.135751 0.192684 0.265219 0.413163 0.72305
5

file:///Users/salmanyousaf/Downloads/HW Assignment 2 (2).html 8/10


07/11/2023, 13:05 HW Assignment 2

The Markov chain is irreducible, suggesting that it is possible to get from any state to
any other state in the Markov chain.
Question 7
In [14]: # Define the number of states and periods
N = 8
T = 5

# Initialize the arrays


f = np.zeros((T,N,N), dtype=np.float64)
Pbar = np.zeros((N,N,N), dtype=np.float64)

# Loop over each state


for j in range(N):
for i in range(N):
for k in range(N):
if k != j:
Pbar[j,i,k] = Pmat[i,k]
else:
Pbar[j,i,k] = 0

# Loop over each state and period


for j in range(N):
for t in range(T):
if t == 0:
f[t,:,j] = Pmat[:,j]
else:
f[t,:,j] = np.matmul(Pbar[j,:,:], f[t-1,:,j])

# Calculate the probabilities of reaching AAA and CCC within 5 periods


prob_AAA = np.sum(f[:T,:,0], axis=0)
prob_CCC = np.sum(f[:T,:,6], axis=0)

file:///Users/salmanyousaf/Downloads/HW Assignment 2 (2).html 9/10


07/11/2023, 13:05 HW Assignment 2

# Print the probabilities


print("Probability of reaching AAA rating within 5 periods:")
for i, label in enumerate(ratings):
print(f"From {label}: {prob_AAA[i]:.6f}")

Probability of reaching AAA rating within 5 periods:


From AAA: 0.910192
From AA: 0.029763
From A: 0.005267
From BBB: 0.001754
From BB: 0.001586
From B: 0.001124
From CCC: 0.005535
From D: 0.000000

If a bond currently has an ""AAA" rating, there’s a 91.02% chance it will maintain that
rating within 5 periods. If a bond currently has a "AA" rating, there’s only a 2.98%
chance it will upgrade to an "AAA" rating within 5 periods and so on.
In [15]: # Print the probabilities
print("\nProbability of reaching CCC rating within 5 periods:")
for i, label in enumerate(ratings):
print(f"From {label}: {prob_CCC[i]:.6f}")

Probability of reaching CCC rating within 5 periods:


From AAA: 0.000945
From AA: 0.002568
From A: 0.007545
From BBB: 0.051917
From BB: 0.066085
From B: 0.153521
From CCC: 0.665099
From D: 0.000000

If a bond currently has an “AAA” rating, there’s only a 0.09% chance it will downgrade
to a “CCC” rating within 5 periods. However, if a bond currently has a “CCC” rating,
there’s a 66.51% chance it will maintain that rating after 5 periods and so on.
In [16]: # Print fi,i for each state
for i in range(n):
print(f'fi,{ratings[i]}: {round(P[0][i,i],4)}')

fi,AAA: 0.9081
fi,AA: 0.9065
fi,A: 0.9105
fi,BBB: 0.8593
fi,BB: 0.8053
fi,B: 0.8346
fi,CCC: 0.6486
fi,D: 1.0

Confirms intuition.
In [ ]:

file:///Users/salmanyousaf/Downloads/HW Assignment 2 (2).html 10/10

You might also like