In this lecture, we will revisit the vanishing gradient problem. The general techniques used to make recurrent networks robust can be applied for general deep neural networks.
Lectures and readings
Srivastava, Rupesh K., Klaus Greff, and Jürgen Schmidhuber. “Training very deep networks.” In Advances in neural information processing systems, pp. 2377-2385. 2015.
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Deep residual learning for image recognition.” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778. 2016.