This project aimed to create a model capable of detecting and explaining emotions in video advertisements. The final framework consists of two main stages: emotion detection and explanation generation.
This work was the basis of my fourth year university project1.
The inital aim for this project was to create a model which, given an advertisement video, could classify the underlying emotions. This was achieved using a CNN which was trained on Pitts dataset. More information on this model can be found here.
To build on the classification model, I built a framework capable of explaining the decisions made by the model. Generating visual clues to give insight into model workings is becoming crucial in understanding large classification models. I achieved this by expanding two standard image model explainability frameworks: Local Interpretable Model-Agnostic Explanations (LIME), and SHapley Additive exPlanations (SHAP). My solution shows exactly which areas of specific frames our model found to be the most funny and exciting.
The image below shows how the framework generates an output. We see here, for example, that the model found this minion falling over a chair to be the funniest part of this video.

This was also the basis for a research paper submitted to SIGIR 2023. ↩︎