Multimodal Recognition of Valence, Arousal and Dominance via Late-Fusion of Text, Audio and Facial Expressions

Home » Publications » Multimodal Recognition of Valence, Arousal and Dominance via Late-Fusion of Text, Audio and Facial Expressions

2023

Multimodal Recognition of Valence, Arousal and Dominance via Late-Fusion of Text, Audio and Facial Expressions

Annette Rios, Uwe Reichel, Chirag Bhuvaneshwara, Panagiotis Filntisis, Petros Maragos, Felix Burkhardt, Florian Eyben, Björn Schuller,Fabrizio Nunnari and Sarah Ebling

We present an approach for the prediction of valence, arousal, and dominance of people communicating via text/audio/video streams for a translation from and to sign languages.

The approach consists of the fusion of the output of three CNN-based models dedicated to the analysis of text, audio, and facial expressions. Our experiments show that any combination of two or three modalities increases prediction performance for valence and arousal

Doi 10.14428/esann/2023.ES2023-128

link to publication

A scientific publication by audEERING GmbH.
More info on our research page