An Approach towards Video Captioning in Bengali

Authors

  • M. M. Rushadul Mannan, Mostafizur Rahman, Md. Shahir Zaoad, Md. Mahbubur Rahman, Angshu Bikash Mandol, Md. Adnanul Islam

Abstract

Video captioning refers to the process of predicting a semantically consistent textual description from a given video clip. Even though a significant amount of research work is present for video captioning in English, for Bengali the field of video captioning is nearly unexplored. Therefore, this research aims at generating Bengali captions that plausibly describe the gist of a specific short video. To accomplish this, Long Short-Term Memory (LSTM) based a sequence-to-sequence model is used that takes the video frame features as input and generates an analogous textual description. In this study, Microsoft Research Video Description Corpus (MSVD) dataset is used which is an English dataset. Therefore, a deep learning-based translator and manual labor are used to convert English captions into appropriate Bengali ones. Finally, the model's performance is evaluated using popular evaluation metrics - BLEU and TER. The proposed approach achieves BLEU and TER scores of 0.38 and 0.76 respectively, establishing a new benchmark for the Bengali video captioning tasks.

Downloads

Published

2022-07-21