In this lecture, we will revisit the vanishing gradient problem. The general techniques used to make recurrent networks robust can be applied for general deep neural networks.
Lectures and readings
-
Srivastava, Rupesh K., Klaus Greff, and Jürgen Schmidhuber. “Training very deep networks.” In Advances in neural information processing systems, pp. 2377-2385. 2015.
-
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Deep residual learning for image recognition.” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778. 2016.