[Original Paper - ImageNet Classification with Deep Convolutional Neural Networks (2012)](https://www.cs.toronto.edu/~hinton/absps/imagenet.pdf) Some questions I had when I was reading the paper - [What does the term saturating nonlinearities mean?](https://stats.stackexchange.com/questions/174295/what-does-the-term-saturating-nonlinearities-mean) - [What Is Saturating Gradient Problem](https://datascience.stackexchange.com/questions/27665/what-is-saturating-gradient-problem) - [Why ReLU is better than the other activation functions](https://datascience.stackexchange.com/questions/23493/why-relu-is-better-than-the-other-activation-functions) - [Why does overlapped pooling help reduce overfitting in conv nets?](https://stats.stackexchange.com/questions/283261/why-does-overlapped-pooling-help-reduce-overfitting-in-conv-nets) - [Importance of local response normalization in CNN](https://stats.stackexchange.com/questions/145768/importance-of-local-response-normalization-in-cnn) - [What Is Local Response Normalization In Convolutional Neural Networks](https://prateekvjoshi.com/2016/04/05/what-is-local-response-normalization-in-convolutional-neural-networks/)