Home » Publications » Dawn of the transformer era in speech emotion recognition: closing the valence gap
2022
J. Wagner, A. Triantafyllopoulos, H. Wierstorf, M. Schmitt, F. Eyben, B. W. Schuller, F. Burkhardt
Recent advances in transformer-based architectures which are pre-trained in self-supervised manner have shown great promise in several machine learning tasks. In the audio domain, such architectures have also been successfully utilised in the field of speech emotion recognition (SER). However, existing works have not evaluated the influence of model size and pre-training data on downstream performance, and have shown limited attention to generalisation, robustness, fairness, and efficiency. The present contribution conducts a thorough analysis of these aspects on several pre-trained variants of wav2vec 2.0 and HuBERT.
March 16 ,2022, CC BY-NC-SA 4.0
Follow audEERING on Social Media!
Home » Publications » Dawn of the transformer era in speech emotion recognition: closing the valence gap
© 2024 audEERING® GmbH