ImageNet Classification with Deep Convolutional Neural Networks
(Left) Eight ILSVRC-2010 test images and the five labels considered most probable by our model.
The correct label is written under each image, and the probability assigned to the correct label is also shown
with a red bar (if it happens to be in the top 5). (Right) Five ILSVRC-2010 test images in the first column. The
remaining columns show the six training images that produce feature vectors in the last hidden layer with the
smallest Euclidean distance from the feature vector for the test image.
Introduction
Image Classification before
Based on ML methods
Using dataset of labeled images images simple recognition tasks can be solved quite well.
Best error rate on the MNIST digit-recognition task (<0.3%) approaches human performance.
Understanding the Problem statement
Objects in realistic settings exhibit considerable variability, so to learn to recognize them it is necessary to use much larger training sets.
However, the immense complexity of the object recognition task means that this problem cannot be specified even by a dataset as large as ImageNet
Procedures
The model should also have lots of prior knowledge to compensate for all the date we don’t have. Convolutional Neural Networks (CNNs) constitute one such class of models.
Compared to standard feedforward neural networks with similary-sized layers.
CNNs have much fewer connections and parameters and so they are easier to train, while their theoretically-best performance is likely to be only slightly worse.
The final network contains five convolutional and three fully-connected layers.
The architecture
Pooling: Reducing the image stack
- Pick a window size (usually 2 or 3)
- Pick a stride (usually 2)
- Walk you window across your filter images
- From each window, take the maximum value
Pooling example
Results
Rectified Linear Unit — ReLUs have the desirable property that they do not require input normalization to prevent them from saturating, wherever a negative number occurs, swap it out for a 0, also to do when blowing up toward infinity.
Pooling Layer, a stack of images becomes a stack of smaller images
Conclusion
The network achieves top 1 and top 5 tet set error rates of 37.5% and 17.0% on ILSVRC-2010 dataset.
The best performance achieved during the ILSVRC-2010 competition was 47.1% and 28.2% top 1 and top 5 error rates release.
On fall 2009 version of Model ImageNet this paper’s results are 67.7 and 40.9%, attained by the network described above but with and additional, sixth convolutional layer over the last pooling layer
The best published results on the dataset are 78.1% and 60.9%
Personal Notes
The current model are better that 10 years back, the use the CNN are help to process big datasets with more efficiency and reduce overfitting and time to recognize the objects with more exactly
Bibliography
[Alex Krizhevsky, Ilya Sutskever Geoffrey E. Hinton] ImageNet Classification with Deep Convolutional Neural Networks https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf