devAIce® SDK 3.8 and 3.9 Updates – New Powerful Module

Milenko Saponja

We are pleased to unveil the latest updates in devAIce® SDK 3.8 and 3.9, bringing substantial improvements and new features to boost your development experience. In this blog post, we will provide a comprehensive overview of the noteworthy enhancements introduced in these releases. This includes the introduction to our new module, which can be used to analyze audio quality.

Improved Scene Model: Next-Generation Capabilities

We are excited to introduce the next generation of the scene model, now integrated into the SDK. This upgraded module encompasses 21 classes, an expansion from the previous 14 classes.  The following classes have been added: café, crowded indoor, elevator, kitchen, residential area, restroom, and subway station. Despite the increased class count, the new model achieves higher accuracy, reaching approximately 60% (for all classes), dependent on the input signal duration. Additionally, we have augmented the module with a moving window average output, enabling real-time usage and seamless adaptability to diverse use cases by adjusting the window size. 

Improved Speaker Attributes: Precision and Inclusivity Combined

To achieve greater accuracy and inclusivity, we have improved the Speaker Attributes module. The age model has been replaced with a more accurate alternative, reducing a mean absolute error to less than 10 years. Notably, our model surpasses the human baseline, showcasing superior performance in this task. Furthermore, the gender model, which determines the perceived gender of a speaker, has been upgraded to a new three-class model, incorporating a dedicated child class. This addition addresses the absence of gender differentiation for children, enabling a comprehensive analysis of speaker attributes. 

Introducing Audio Quality Analysis Module

Introducing the new Audio Quality module, now available in the latest devAIce® SDK. This powerful tool enables thorough analysis of audio signal quality. Users can now detect anomalies or issues in the audio input and perform quality checks. The module is designed to work seamlessly with other modules, ensuring optimal performance and providing the ability to exclude audio snippets, e.g., with excessive background noise. It delivers two essential output values: Signal to Noise Ratio (SNR) and reverberation time (RT60), facilitating comprehensive audio quality assessment. 

Enhanced Events Model: Acoustic Event Detection (AED)

The Events module has been renamed to Acoustic Event Detection (AED). Equipped with a novel model, the AED module delivers refined precision and significantly improved recall in detecting speech and music. This advancement ensures the accurate identification of actual positives, elevating the reliability of your results. 

Additional Enhancements and Improvements 

In addition to the aforementioned updates, we have introduced several other notable changes and improvements: 

  • Emotion (Large) and Emotion modules now offer new settings, allowing users to selectively enable either the dimensional or categorical models. This customization reduces resource consumption if only a single one of those is needed. 
  • On the API side, we have introduced a serial key, bolstering security measures for enhanced protection. 
  • Our documentation has undergone several improvements, including comprehensive revisions, and the addition of detailed information on resource usage and memory consumption. Furthermore, we have updated the descriptions of loudness and intonation in the Prosody module to enhance clarity and understanding. 

Prepare to leverage the full potential of devAIce® SDK 3.9 with these exciting updates! Stay tuned for more detailed insights on each feature in our upcoming technical blog posts.