ImageNet classification with deep convolutional neural networks

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (June 2017), 84–90. DOI:

This paper showcases state-of-the-art classification results on images from the ImageNet LSVRC-2010 and 2012 challenges. They classified 1.2 million images into 1000 class categories. To reduce overfitting, they used the regularization method called "dropout." They use a convolutional neural net (CNN) due to its capacity to be controlled for depth and breadth and they make fewer connections and parameters, making them easier to train. At this time, CNNs are still highly computational expensive, but current GPUs and 2D convolution (summing results into a single output pixel) make it easier. They chose to use Rectified Linear Units (ReLUs) over traditional tahn units, due to faster training which improves performance on large models. They spread their CNN across multiple GPUs, putting half the kernals (neurons) on each GPU, which they state is difficult for cross-validation but allows for tuning computation time. To reduce overfitting, they use two data augmentation (artifically enlarging the dataset) methods. First, they generate image translations and horizontal reflections. Second, the alter the RGB intensity in images. They use dropout, a method for model combination, which drops neurons so that they do not participate in forward pass or back propagation.