Professional Documents
Culture Documents
1 Related Works
1 Related Works
References
[1] Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu, Anirudh Goyal, Ryan
Lowe, Joelle Pineau, Aaron Courville, and Yoshua Bengio. An Actor-
Critic Algorithm for Sequence Prediction. arXiv:1607.07086v1 [cs.LG],
2016. URL http://arxiv.org/abs/1607.07086.
[2] Bram Bakker. Reinforcement Learning with Long Short-Term Memory.
Advances in Neural Information Processing Systems 14, pages 14751482,
2002. ISSN 1049-5258.
[3] David Balduzzi and Muhammad Ghifary. Compatible Value Gradients for
Reinforcement Learning of Continuous Deep Policies. pages 127, 2015.
URL http://arxiv.org/abs/1509.03005.
[4] Michael Fairbank, Eduardo Alonso, and Danil Prokhorov. An equiva-
lence between adaptive dynamic programming with a critic and back-
1
propagation through time. IEEE Transactions on Neural Networks
and Learning Systems, 24(12):20882100, 2013. ISSN 2162237X. doi:
10.1109/TNNLS.2013.2271778.
[5] F. J. Gomez and J. Schmidhuber. Co-Evolving Recurrent Neurons Learn
Deep Memory POMDPs. Galleria Rassegna Bimestrale Di Cultura, pages
114, 2004. doi: 10.1145/1068009.1068092.
[6] Samuel W Hasinoff. Reinforcement Learning for Problems with Hidden
State. University of Toronto, Technical Report, pages 118, 2003.
[7] Matthew Hausknecht and Peter Stone. Deep Recurrent Q-Learning for
Partially Observable MDPs. 2015.
[8] Nicolas Heess, Jonathan J Hunt, Timothy P Lillicrap, and David Silver.
Memory-based control with recurrent neural networks. arXiv, pages 111,
2015. URL http://arxiv.org/abs/1512.04455.
[9] Natasha Jaques, Shixiang Gu, Richard E Turner, and Douglas Eck. Gen-
erating Music by Fine-Tuning Recurrent Neural Networks with Reinforce-
ment Learning. pages 111, 2016.
[10] Michael I Jordan and Robert A Jacobs. Learning to control an unstable
system with forward modeling. Advances in Neural Information Processing
Systems, 2:324331, 1990.
[11] L.J. Lin and T.M. Mitchell. Memory approaches to reinforcement learning
in non-Markovian domains. Artificial Intelligence, 8(7597):28, 1992. URL
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.52.319.
[12] Derrick H. Nguyen and Bernard Widrow. Neural Networks for Self-Learning
Control Systems, 1990. ISSN 02721708.
[14] Danil V. Prokhorov and Donald C. Wunsch. Adaptive critic designs. IEEE
Transactions on Neural Networks, 8(5):9971007, 1997. ISSN 10459227.
doi: 10.1109/72.623201.
[15] Jurgen Schmidhuber, Sepp Hochrieter, and Yoshua Bengio. Evaluating
Long-term Dependendency Denchmark Problems by Random Guessing.
1997.
[16] Daan Wierstra and F Alexander. Recurrent Policy Gradients. (May 2009).