Number of Visitors: 1737


Master Thesis Defense Session

Master Thesis Defense Session by Sara Azima Haghighi, Deep Learning for Multi-modal Multi-label Emotion Recognition

News Code: 17

Publishing Date: 18 Sep 2022 0:52

In The Name of God

Master Thesis Defense Session


Computer Engineering, Artificial Intelligence Engineering​



Dr. Hossein Karshenas Najafabadi


Dr. Hamid Reza Baradaran Kashani

Internal Reviewer:

Dr. Peyman Adibi

External Reviewer:

Dr. Marjan Kaedi


Sara Azima Haghighi

Date: 18 September 2022

Time: 11:00 AM


Ansari building, Third floor, Dr. Braani Hall



Deep Learning for Multi-modal Multi-label Emotion Recognition

The importance of mental health in improving the quality of life is not deniable, and therefore it has received significant attention in human societies. Affective computing, as an interdisciplinary field between computer science and psychology, deals with the analysis and investigation of various aspects of human’s mental and psychological states, using computer algorithms. One of the important applications of this field is the recognition of affections and emotions. In the real world, recognition and understanding of emotions is done by using different forms of information such as image, sound and text. The simultaneous occurrence of different modalities allows humans to identify the expressed emotion more accurately. Additionally, the behavior and expression of human emotions have a high complexity that cannot be easily described and distinguished from each other with a simple description emotion. In this research, a deep learning approach is presented for effectively fusing different modalities to identify emotions as a multi-label classification problem. With this purpose, after encoding the information obtained from different modalities using convolutional and memory structures in the deep neural network, an attention mechanism has been used to integrate multimodal information and represent information in a common hidden space. The datasets examined in this study have consist of video films containing subtitles, which include three modalities of image, speech, and text. One of the main challenges in this type of datasets is the complexity of classification due to the imbalance of the data from different labels. To extract different emotions from such dataset with different modalities, a multi-label multimodal cost function is presented, which improves the performance of model learning process. The results obtained from the experiments show that the proposed model has a promising performance compared to many other state-of-the-art methods for emotion recognition.