Professional Documents
Culture Documents
RL Poster Activity
RL Poster Activity
INTRODUCTION
The research paper discusses the development of an efficient multitask reinforcement learning (RL) algorithm that does not
sacrifice performance. It proposes a method called Iterative Sparse Bayesian Policy Optimization (ISBPO) for sequential
multitask RL. The paper focuses on the efficiency of learning multiple tasks simultaneously, which is a common problem in
real-world applications. The proposed ISBPO method is introduced as a solution to these challenges. It is designed to
handle a sequence of control tasks by consecutively optimizing policy network weights based on SBPO. The SBPO
algorithm produces sparse weights w∗ k and the corresponding binary mask, mk, which records the valid weight positions
of a policy network
THEORETICAL EXPLANATION
Sparse Bayesian Policy Optimization (SBPO):
- Randomize weights w as random variables: q(w) = N(μ, σ2)
- Training via Variational Inference (VI)
- Evidence Lower Bound (ELBO):
L(φ) = E[log p(D|w)] - λ DKL(q(w)||p(w))
- Stochastic Gradient Variational Bayes (SGVB) update:
← ∇
φ φ + η L(φ)
- Pruning using Signal-to-Noise Ratio (SNR):
SNR(wi) = (E[wi])2 / Var[wi] = μi2 / σi2
Iterative Learning:
- Task sequence T1:K = [T1, T2, ..., TK]
Proposed Method: Iterative Sparse Bayesian Policy Optimization (ISBPO) - For task Tk+1, after learning T1:k
Overview of the proposed ISBPO method ⊕
- Trainable weight space: Wk+1 = (W1* W2* ⊕ ... ⊕ Wk*)^⊥
- Sparse Bayesian Policy Optimization (SBPO) technique ∈
- Train wk+1 Wk+1 to get w*k+1 using SBPO
- Randomizing weights using Variational Inference (VI) - Overall weights: w = w*1 + w*2 + ... + w*k
- Training objective with sparsity-inducing regularization
- Pruning based on Signal-to-Noise Ratio (SNR) criterion Key Contributions:
- Iterative learning of multiple tasks using ISBPO
- Sequential learning without catastrophic forgetting
- Sequential learning of tasks
- Reusing surviving weights for knowledge transfer - Efficient weight allocation and pruning
- Task-specific binary masks for activation of relevant weights - Improved sample efficiency via knowledge transfer
This confirms that ISBPO uses limited policy network resources very
efficiently and economically.
CONCLUSION
The ISBPO algorithm is designed to handle a sequence of control tasks by consecutively optimizing policy network weights based on the Sparse Bayesian Policy
Optimization (SBPO) algorithm. The algorithm produces sparse weights and corresponding binary masks, which are used to train for new tasks while reusing
previously learned weights. The results of the experiments on robot manipulation tasks and image-based dexterous manipulation tasks demonstrate the
effectiveness of the ISBPO algorithm in handling multiple tasks efficiently. The paper also discusses the use of sparsity-inducing prior distributions to intentionally
obtain a sparse network and the optimization process using a gradient ascent method. Overall, the ISBPO algorithm provides a principled and efficient solution for
multitask RL, which is crucial for real-world applications.