Image Captionbot for Assistive Technology

Authors

  • Safiya K. M., Dr. R. Pandian

Abstract

Generating small descriptions from the image is a very difficult task because of the   complexity of image features and the vastness of the language contexts. An image may contain a wide variety  of  information and thus extracting the context of the information contained in the image and generation of the sentence using  that context is a very complex task. However, the task can help blind people to understand the surrounding without others  assistance. Deep learning techniques have emerged as a new trend in programming and can be utilized to develop this kind  of system. In the project, we will be using VGG16, one of the best CNN architectures for image classification and for  extracting features from images. An embedding layer and LSTM will be used for text description. And these two networks  will be combined to form an image caption generation network. Then we will train the model using data prepared from the  flickr8k dataset. The trained model will be used to generate caption from new images and the generated caption will be  converted to audio for helping the blind.

 

Downloads

Published

2022-08-14