In this lecture, we will look at the problem of modeling sequences with neural networks. We will first see recurrent neural networks and then move on to their more commonly used incarnations – namely, the longshort term memory (LSTM) network and the gated recurrent unit (GRU).
Lectures and readings
Readings
 Chapter 14 of Yoav Goldberg, Neural network methods for natural language processing. Synthesis Lectures on Human Language Technologies. 2017 Apr 17;10(1):1309.
Papers

Elman, Jeffrey L. “Finding structure in time.” Cognitive science 14, no. 2 (1990): 179211.

Pearlmutter, Barak A. “Gradient calculations for dynamic recurrent neural networks: A survey.” IEEE Transactions on Neural networks 6, no. 5 (1995): 12121228.

Hochreiter, Sepp, and Jürgen Schmidhuber. “Long shortterm memory.” Neural computation 9, no. 8 (1997): 17351780.

Schuster, Mike, and Kuldip K. Paliwal. “Bidirectional recurrent neural networks.” IEEE Transactions on Signal Processing 45, no. 11 (1997): 26732681.

Lipton, Zachary C., John Berkowitz, and Charles Elkan. “A critical review of recurrent neural networks for sequence learning.” arXiv preprint arXiv:1506.00019 (2015).

(*) Chung, Junyoung, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio. “Gated feedback recurrent neural networks.” In International Conference on Machine Learning, pp. 20672075. 2015.

(*) Greff, Klaus, Rupesh K. Srivastava, Jan Koutník, Bas R. Steunebrink, and Jürgen Schmidhuber. “LSTM: A search space odyssey.” IEEE transactions on neural networks and learning systems 28, no. 10 (2017): 22222232.
Blog posts
Several blog posts explain and explore recurrent neural networks along various facets.

The Unreasonable Effectiveness of Recurrent Neural Networks: A blog post that talks