ImageNet Classification with Deep Convolutional Neural Networks

3 min readMar 26, 2020

ImageNet Classification with Deep Convolutional Neural Networks

(Left) Eight ILSVRC-2010 test images and the five labels considered most probable by our model.
The correct label is written under each image, and the probability assigned to the correct label is also shown
with a red bar (if it happens to be in the top 5). (Right) Five ILSVRC-2010 test images in the first column. The
remaining columns show the six training images that produce feature vectors in the last hidden layer with the
smallest Euclidean distance from the feature vector for the test image.

Introduction

Image Classification before

Based on ML methods
Using dataset of labeled images images simple recognition tasks can be solved quite well.
Best error rate on the MNIST digit-recognition task (<0.3%) approaches human performance.

Understanding the Problem statement

Objects in realistic settings exhibit considerable variability, so to learn to recognize them it is necessary to use much larger training sets.

However, the immense complexity of the object recognition task means that this problem cannot be specified even by a dataset as large as ImageNet

Procedures

The model should also have lots of prior knowledge to compensate for all the date we don’t have. Convolutional Neural Networks (CNNs) constitute one such class of models.
Compared to standard feedforward neural networks with similary-sized layers.
CNNs have much fewer connections and parameters and so they are easier to train, while their theoretically-best performance is likely to be only slightly worse.
The final network contains five convolutional and three fully-connected layers.

The architecture

Pooling: Reducing the image stack

Pick a window size (usually 2 or 3)
Pick a stride (usually 2)
Walk you window across your filter images
From each window, take the maximum value

Pooling example

Results

Architecture final, reduce image map with CNN and ReLUs (Alex Krizhevsky)

Rectified Linear Unit — ReLUs have the desirable property that they do not require input normalization to prevent them from saturating, wherever a negative number occurs, swap it out for a 0, also to do when blowing up toward infinity.

Pooling Layer, a stack of images becomes a stack of smaller images

Conclusion

The network achieves top 1 and top 5 tet set error rates of 37.5% and 17.0% on ILSVRC-2010 dataset.
The best performance achieved during the ILSVRC-2010 competition was 47.1% and 28.2% top 1 and top 5 error rates release.
On fall 2009 version of Model ImageNet this paper’s results are 67.7 and 40.9%, attained by the network described above but with and additional, sixth convolutional layer over the last pooling layer
The best published results on the dataset are 78.1% and 60.9%

Personal Notes

The current model are better that 10 years back, the use the CNN are help to process big datasets with more efficiency and reduce overfitting and time to recognize the objects with more exactly

Bibliography

[Alex Krizhevsky, Ilya Sutskever Geoffrey E. Hinton] ImageNet Classification with Deep Convolutional Neural Networks https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf