2025

Testing of speech emotion recognition models

Derington, A., Wierstorf, H., Özkil, A., Eyben, F., Burkhardt, F., and Schuller, B. W.

Testing correctness, fairness, and robustness of speech emotion recognition models. IEEE Transactions on Affective Computing

Machine learning models for speech emotion recognition (SER) can be trained for different tasks and are usually evaluated based on a few available datasets per task. Tasks could include arousal, valence, dominance, emotional categories, or tone of voice. Those models are mainly evaluated in terms of correlation or recall, and always show some errors in their predictions. The errors manifest themselves in model behaviour, which can be very different along different dimensions even if the same recall or correlation is achieved by the model. This paper introduces a testing framework to investigate behaviour of speech emotion recognition models, by requiring different metrics to reach a certain threshold in order to pass a test. The test metrics can be grouped in terms of correctness, fairness, and robustness.

A scientific publication by audEERING GmbH.
More info on our research page