Today, we are happy to announce the public release of devAIceⓇ SDK 3.7.0. This update comes with several noteworthy model updates for emotion and age recognition, the deprecation of the Sentiment module, as well as numerous other minor tweaks, improvements and fixes.
As always, we recommend all users of devAIceⓇ SDK to consider updating to the latest SDK version in order to take advantage of model enhancements, fixes and new functionality.
What’s new in devAIceⓇ SDK 3.7.0
Emotion model robustness improvements
The dimensional and categorical emotion models that are part of the Emotion (Large) module have been updated in this release. In our benchmarks, the new version of these models shows significantly higher robustness to background noises and changes in recording conditions compared to the previous models. Thus, we expect the new models to provide more accurate predictions when the analysed audio contains noise artifacts. Also, recording the same speaker using different microphones or under different recording conditions (e.g. recording volume, distance of speaker to microphone, reverb) should lead to more consistent emotion results with the new models.
When updating to this newer version of the models, please keep in mind that any application-specific thresholds or logic you may have defined for the model output based on the previous version may need to be re-evaluated and adjusted for the new version. This is because, even though the value ranges and semantics of the emotion dimensions and categories have not changed between versions, the new models may still behave differently in certain aspects that invalidate your previously determined logic and thresholds.
New generation age model
The 3.7.0 update of devAIceⓇ SDK comes with a revised age model included as part of the Speaker Attributes module. The previous model has been replaced with a significantly more accurate and robust model utilizing a modern deep learning-based architecture. The new model predicts the correct age with a mean absolute error (MAE) of 10.56 years when tested on real-world clean and noisy speech data. This means that, on average, the model is about 11 years off the true age of a speaker. For comparison, the age model included in previous versions of devAIce exhibited a mean absolute error of 19.51 years on the same test set. Thus, the new model almost halves the prediction error made by the previous model.
The improvement in accuracy comes with an increase in runtime CPU and memory consumption over the previous model since the new model is significantly larger and more complex. We recommend all users who are updating to this version from a previous version, to check whether the new model resource requirements are acceptable for their applications. For users that are only interested in the gender output of the Speaker Attributes module and do not need the age output, it is recommended to disable the age output via the corresponding module configuration setting to save resources and speed up the analysis.
Deprecation of the Sentiment module
The Sentiment module has been deprecated and removed from devAIceⓇ SDK with this release. This does not include the Multi-Modal Emotion module which combines acoustic and text-based analysis, and continues to be available and fully supported.
devAIceⓇ has always been about leveraging the advantages of acoustic-based analysis methods over traditional text-based methods. Going forward, we will be strengthening our focus on acoustic and multi-modal approaches and suggest customers to consider third-party solutions for pure text-based analysis, or stay on the previous release 3.6.1 if they have a dependency on the Sentiment module in devAIceⓇ.
Other changes and improvements
Other improvements that this update introduces include refinements in the speaking rate estimation, a modernized iOS app project sample, and a set of smaller API enhancements and minor bug fixes.
As always, you can find a full list of all changes in this release in the official changelog document that comes as part of the devAIceⓇ SDK package.
Also, please note that only the models of the Emotion (Large) module have been updated in this release. The models underlying the Emotion module remain unchanged for now. We are currently evaluating if we can introduce similar robustness improvements also to the smaller emotion module in the future.