Inception v3[1] [2] is a convolutional neural network (CNN) for assisting in image analysis and object detection, and got its start as a module for GoogLeNet. It is the third edition of Google's Inception Convolutional Neural Network, originally introduced during the ImageNet Recognition Challenge. The design of Inceptionv3 was intended to allow deeper networks while also keeping the number of parameters from growing too large: it has "under 25 million parameters", compared against 60 million for AlexNet.
Just as ImageNet can be thought of as a database of classified visual objects, Inception helps classification of objects[3] in the world of computer vision. The Inceptionv3 architecture has been reused in many different applications, often used "pre-trained" from ImageNet. One such use is in life sciences, where it aids in the research of leukemia.[4]
In 2014, a team at Google developed GoogLeNet, which won the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The name came from the LeNet of 1998, since both LeNet and GoogLeNet are CNNs. They also called it "Inception" after a "we need to go deeper" internet meme, a phrase from Inception (2010) the film. Because later, more versions were released, the original Inception architecture was renamed again as "Inception v1".
The models and the code were released under Apache 2.0 license on GitHub.The Inception v1 architecture is a deep CNN composed of 22 layers. Most of these layers were "Inception modules". The original paper stated that Inception modules are a "logical culmination" of and.[5]
Since Inception v1 is deep, it suffered from the vanishing gradient problem. The team solved it by using two "auxiliary classifiers", which are linear-softmax classifiers inserted at 1/3-deep and 2/3-deep within the network, and the loss function is a weighted sum of all three:
These were removed after training was complete. This was later solved by the ResNet architecture.
Inception v2 was released in. It improves on Inception v1 by using factorized convolutions. For example, a single 5×5 convolution can be factored into 3×3 stacked on top of another 3×3.
Both has a receptive field of size 5×5. The 5×5 convolution kernel has 25 parameters, compared to just 18 in the factorized version. Thus, the 5×5 convolution is strictly more powerful than the factorized version. However, this power is not necessarily needed. Empirically, the research team found that factorized convolutions help.
Inception v3 was also released in. It improves on Inception v2 by using
In [7] the team released Inception v4, Inception ResNet v1, and Inception ResNet v2.
Inception v4 is an incremental update with even more factorized convolutions, and other complications that were empirically found to improve benchmarks.
Inception ResNet v1 and v2 are both modifications of Inception v4, where residual connections are added to each Inception module.