ImageNet
The ImageNet project is a large visual database designed for use in visual object recognition software research. More than 14 million[1][2] images have been hand-annotated by the project to indicate what objects are pictured and in at least one million of the images, bounding boxes are also provided.[3] ImageNet contains more than 20,000 categories[2] with a typical category, such as "balloon" or "strawberry", consisting of several hundred images.[4] The database of annotations of third-party image URLs is freely available directly from ImageNet, though the actual images are not owned by ImageNet.[5] Since 2010, the ImageNet project runs an annual software contest, the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), where software programs compete to correctly classify and detect objects and scenes. The challenge uses a "trimmed" list of one thousand non-overlapping classes.[6]
Contents
Significance for deep learning and relation to other visual recognition challenges[edit]
On September 30, 2012, a convolutional neural network (CNN) called AlexNet[7] achieved a top-5 error of 15.3% in the ImageNet 2012 Challenge, more than 10.8 percentage points lower than that of the runner up. This was made feasible due to the utilization of graphics processing units (GPUs) during training,[7] an essential ingredient of the deep learning revolution. According to The Economist, "Suddenly people started to pay attention, not just within the AI community but across the technology industry as a whole."[4][8][9]
The AI and computer vision communities, however, were already well aware of the power of CNNs. A GPU implementation of a CNN by K. Chellapilla et al. (2006) was 4 times faster than an equivalent implementation on CPU.[10] A deep CNN of Dan Ciresan et al. (2011) at the Swiss AI lab IDSIA was already 60 times faster[11] and achieved superhuman visual recognition performance in August 2011.[12][13] Between May 15, 2011 and September 10, 2012, their CNN won no less than four visual recognition challenges.[14][15] They also significantly improved on the best performance in the literature for multiple image databases.[16]
According to the AlexNet paper,[7] Ciresan's earlier net is "somewhat similar." Both were originally written with CUDA to run with GPU support. In fact, both are actually just variants of old CNN designs introduced by Yann LeCun et al. (1989)[17][18] who applied the backpropagation algorithm to a variant of Kunihiko Fukushima's original CNN architecture called "neocognitron."[19][20] The architecture was later modified by J. Weng's method called max-pooling.[21][15]
In 2015, AlexNet was outperformed by Microsoft's very deep CNN with over 100 layers, which won the ImageNet 2015 contest.[22]
History of the database[edit]
The database was presented for the first time as a poster at the 2009 Conference on Computer Vision and Pattern Recognition (CVPR) in Florida by researchers from the computer Science department at Princeton University.[23][24] ImageNet primary researchers and inventors include Stanford University computer science professor and researcher Fei-Fei Li.[25]
Dataset[edit]
ImageNet crowdsources its annotation process. Image-level annotations indicate the presence or absence of an object class in an image, such as "there are tigers in this image" or "there are no tigers in this image". Object-level annotations provide a bounding box around the (visible part of the) indicated object. ImageNet uses a variant of the broad WordNet schema to categorize objects, augmented with 120 categories of dog breeds to showcase fine-grained classification.[6] One downside of WordNet use is the categories may be more "elevated" than would be optimal for ImageNet: "Most people are more interested in Lady Gaga or the iPod Mini than in this rare kind of diplodocus." In 2012 ImageNet was the world's largest academic user of Mechanical Turk. The average worker identified 50 images per minute.[2]
History of the ImageNet Challenge[edit]
Since 2010, the annual ImageNet Large Scale Visual Recognition Challenge (ILSVRC) is a competition where research teams evaluate their algorithms on the given data set, and compete to achieve higher accuracy on several visual recognition tasks. The ILSVRC aims to "follow in the footsteps" of the smaller-scale PASCAL VOC challenge, established in 2005, which contained only about 20,000 images and twenty object classes.[6] The ILSVRC uses a "trimmed" list of only 1000 image categories or "classes", including 90 of the 120 dog breeds classified by the full ImageNet schema.[6] The 2010s saw dramatic progress in image processing. Around 2011, a good ILSVRC classification error rate was 25%. In 2012, a deep convolutional neural net called AlexNet achieved 16%; in the next couple of years, error rates fell to a few percent.[26] While the 2012 breakthrough "combined pieces that were all there before", the dramatic quantitative improvement marked the start of an industry-wide artificial intelligence boom.[4] By 2015, researchers at Microsoft reported that their CNNs exceeded human ability at the narrow ILSVRC tasks.[22][27] However, as one of the challenge's organizers, Olga Russakovsky, pointed out in 2015, the programs only have to identify images as belonging to one of a thousand categories; humans can recognize a larger number of categories, and also (unlike the programs) can judge the context of an image.[28]
By 2014, more than fifty institutions participated in the ILSVRC.[6] In 2015, Baidu scientists were banned for a year for using different accounts to greatly exceed the specified limit of two submissions per week.[29][30] Baidu later stated that it fired the team leader involved and that it would establish a scientific advisory panel.[31]
In 2017, 29 of 38 competing teams had greater than 95% accuracy.[32] In 2017 ImageNet stated it would roll out a new, much more difficult, challenge in 2018 that involves classifying 3D objects using natural language. Because creating 3D data is more costly than annotating a pre-existing 2D image, the dataset is expected to be smaller. The applications of progress in this area would range from robotic navigation to augmented reality.[33]
Non-competition results[edit]
Around November 2017, Google's AutoML project to evolve new neural net topologies created NASNet, a system optimized for ImageNet and COCO. According to Google, NASNet's performance exceeded all previously published ImageNet performance.[34]
See also[edit]
References[edit]
- ^ "New computer vision challenge wants to teach robots to see in 3D". New Scientist. 7 April 2017. Retrieved 3 February 2018.
- ^ a b c Markoff, John (19 November 2012). "For Web Images, Creating New Technology to Seek and Find". The New York Times. Retrieved 3 February 2018.
- ^ "ImageNet Summary and Statistics". ImageNet. Retrieved 22 June 2016.
- ^ a b c "From not working to neural networking". The Economist. 25 June 2016. Retrieved 3 February 2018.
- ^ "ImageNet Overview". ImageNet. Retrieved 22 June 2016.
- ^ a b c d e Olga Russakovsky*, Jia Deng*, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg and Li Fei-Fei. (* = equal contribution) ImageNet Large Scale Visual Recognition Challenge. IJCV, 2015.
- ^ a b c Krizhevsky, Alex; Sutskever, Ilya; Hinton, Geoffrey E. (June 2017). "ImageNet classification with deep convolutional neural networks" (PDF). Communications of the ACM. 60 (6): 84–90. doi:10.1145/3065386. ISSN 0001-0782. Retrieved 2017-05-24.
- ^ "Machines 'beat humans' for a growing number of tasks". Financial Times. 30 November 2017. Retrieved 3 February 2018.
- ^ Gershgorn, Dave; Gershgorn, Dave. "The inside story of how AI got good enough to dominate Silicon Valley". Quartz. Retrieved 2018-12-10.
- ^ Kumar Chellapilla; Sid Puri; Patrice Simard (2006). "High Performance Convolutional Neural Networks for Document Processing". In Lorette, Guy. Tenth International Workshop on Frontiers in Handwriting Recognition. Suvisoft.
- ^ Ciresan, Dan; Ueli Meier; Jonathan Masci; Luca M. Gambardella; Jurgen Schmidhuber (2011). "Flexible, High Performance Convolutional Neural Networks for Image Classification" (PDF). Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence-Volume Volume Two. 2: 1237–1242. Retrieved 17 November 2013.
- ^ "IJCNN 2011 Competition result table". OFFICIAL IJCNN2011 COMPETITION. 2010. Retrieved 2019-01-14.
- ^ Markoff, John (23 November 2012). "Scientists See Promise in Deep-Learning Programs". The New York Times. Retrieved 3 February 2018.
- ^ Schmidhuber, Jürgen (17 March 2017). "History of computer vision contests won by deep CNNs on GPU". Retrieved 14 January 2019.
- ^ a b Schmidhuber, Jürgen (2015). "Deep Learning". Scholarpedia. 10 (11): 1527–54. CiteSeerX 10.1.1.76.1541. doi:10.1162/neco.2006.18.7.1527. PMID 16764513.
- ^ Ciresan, Dan; Meier, Ueli; Schmidhuber, Jürgen (June 2012). Multi-column deep neural networks for image classification. 2012 IEEE Conference on Computer Vision and Pattern Recognition. New York, NY: Institute of Electrical and Electronics Engineers (IEEE). pp. 3642–3649. arXiv:1202.2745. CiteSeerX 10.1.1.300.3283. doi:10.1109/CVPR.2012.6248110. ISBN 978-1-4673-1226-4. OCLC 812295155. Retrieved 2013-12-09.
- ^ Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel, Backpropagation Applied to Handwritten Zip Code Recognition; AT&T Bell Laboratories
- ^ LeCun, Yann; Léon Bottou; Yoshua Bengio; Patrick Haffner (1998). "Gradient-based learning applied to document recognition" (PDF). Proceedings of the IEEE. 86 (11): 2278–2324. CiteSeerX 10.1.1.32.9552. doi:10.1109/5.726791. Retrieved October 7, 2016.
- ^ Fukushima, K. (2007). "Neocognitron". Scholarpedia. 2 (1): 1717. doi:10.4249/scholarpedia.1717.
- ^ Fukushima, Kunihiko (1980). "Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position" (PDF). Biological Cybernetics. 36 (4): 193–202. doi:10.1007/BF00344251. PMID 7370364. Retrieved 16 November 2013.
- ^ Weng, J; Ahuja, N; Huang, TS (1993). "Learning recognition and segmentation of 3-D objects from 2-D images". Proc. 4th International Conf. Computer Vision: 121–128.
- ^ a b He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian (2016). "Deep Residual Learning for Image Recognition". 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
- ^ Gershgorn, Dave (2017-07-26). "The data that transformed AI research—and possibly the world". Quartz. Atlantic Media Co. Retrieved 2017-07-26.
- ^ Deng, Jia; Dong, Wei; Socher, Richard; Li, Li-Jia; Li, Kai; Fei-Fei, Li (2009), "ImageNet: A Large-Scale Hierarchical Image Database" (PDF), 2009 conference on Computer Vision and Pattern Recognition
- ^ Li, Fei-Fei, How we're teaching computers to understand pictures, retrieved 2018-12-16
- ^ Robbins, Martin (6 May 2016). "Does an AI need to make love to Rembrandt's girlfriend to make art?". The Guardian. Retrieved 22 June 2016.
- ^ Markoff, John (10 December 2015). "A Learning Advance in Artificial Intelligence Rivals Human Abilities". The New York Times. Retrieved 22 June 2016.
- ^ Aron, Jacob (21 September 2015). "Forget the Turing test – there are better ways of judging AI". New Scientist. Retrieved 22 June 2016.
- ^ Markoff, John (3 June 2015). "Computer Scientists Are Astir After Baidu Team Is Barred From A.I. Competition". The New York Times. Retrieved 22 June 2016.
- ^ "Chinese search giant Baidu disqualified from AI test". BBC News. 14 June 2015. Retrieved 22 June 2016.
- ^ "Baidu fires researcher involved in AI contest flap". PCWorld. 11 June 2015. Retrieved 22 June 2016.
- ^ Gershgorn, Dave (10 September 2017). "The Quartz guide to artificial intelligence: What is it, why is it important, and should we be afraid?". Quartz. Retrieved 3 February 2018.
- ^ "New computer vision challenge wants to teach robots to see in 3D". New Scientist. 7 April 2017. Retrieved 3 February 2018.
- ^ "Google AI creates its own 'child' bot". The Independent. 5 December 2017. Retrieved 5 February 2018.