Computer vision has become omnipresent in
our society, with uses in several fields. In this project, we
specialize in one among the visually imparting
recognition of images in computer vision, that is image
captioning. The problem of generating language
descriptions for images is still considered a problem
which needs a resolution and this has been studied more
regressively within the field of videos. From past few
years more emphasis has been given to still images and
their descriptions with human understandable natural
language. The task of detecting scenes and object has
become easier due studies that have taken place in last
few years. The main motive of our project is to train
convolutional neural networks and applying various
hyper parameters with huge datasets of images like
Flicker 8k and Resnet, and combining the results of these
images and their classifiers with a recurrent neural and
obtain the desired caption for the image. In this paper we
would be presenting the detailed architecture of the
image captioning model.
Keywords : Computer Vision, Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Xception, Flicker 8K, LSTM, Preprocessing.