University: University of Stirling
Title: Image/Video auto-captioning using Deep Learning
It is a simple and effortless task for humans to perceive the visual scene and describe it in natural language. Yet, for a computer this task is highly complex and entails challenges. One of the greatest ambitions of Artificial Intelligence (AI) is to create computers which are able to observe and understand the surrounding world in all its richness and are capable of communicating that in natural language. The aim of this project is to create an algorithm for efficient auto-captioning of images and videos using Deep Learning and Natural Language Processing that exploit the intermodal correspondences between language and visual data. Such an algorithm can be used in a wide range of applications areas which are beneficial for the empowerment of the human society. Examples include a) helping the visually impaired, b) autonomous car vision, c) drone vision for monitoring, security and surveillance in public/private spaces and d) companion robot vision where robot can describe the surroundings.
This project proposed a novel application for economically significant creative industries. This project successfully demonstrates how key scenes in an entertainment video is described automatically which is then used for video browsing applications. The core of this project stands on development of an auto-captioning algorithm that uses convolutional neural network for automatic object identification within the scene, recurrent neural network to generate natural text descriptions, automatic video segmentation that identifies significant scene changes and an interactive video player interface.
The project also developed an interactive easy to use video player to exploit power of AI for general users. This includes a) the progress bar displays markers over which the user can hover their mouse to read description of the scene (when there is a significant change in the scene) and b) search using text to find a particular scene within archive of films / videos. For example a query ‘Where is London Bridge’ will return a list of all videos that has London Bridge along with precise locations of those scenes within the video. To the best knowledge, this is a unique project that developed an auto-captioning based application for creative industries.
Originally from Hungary, after high school Albert Jozsa-Kiraly studied at the University of Stirling and received his BSc (Hons) Computing Science degree (first class) in June 2019. While he was an undergraduate, he also completed two software engineering internships in Hungary and studied abroad for a semester in California. Following his graduation from University of Stirling, he progresses on to the MSc in Computer Science course at the University of Southampton beginning in September 2019. The two computing areas that interest him the most are software engineering and artificial intelligence, besides others. His outside interests include swimming, hiking, and travelling.