Following the recent SDK updates, we are thrilled to announce the latest enhancements in the devAIce® Web API versions. These updates bring a host of new features and improvements to empower developers in leveraging the full potential of our advanced technology. Let’s explore the exciting additions in devAIce® 4.2 and 4.3.
Enhanced Scene Model: Next-Generation Capabilities
The new generation of our scene model has been seamlessly integrated into the Web API. This update brings an expanded range of 21 classes, surpassing the previous 14 classes. The following classes have been added: café, crowded indoor, elevator, kitchen, residential area, restroom, and subway station, enhancing the model’s ability to accurately identify diverse scenes. Remarkably, despite the increased class count, the accuracy of the model has significantly improved, reaching approximately 60% (for all classes), depending on the input signal duration. Furthermore, we have extended the module with a moving window average output, facilitating real-time usage and adaptability to diverse use cases by adjusting the window size.
Upgraded Speaker Attributes: Precision and Inclusivity Combined
To achieve higher accuracy and inclusivity, we have introduced enhancements to the Speaker Attributes module. The age model has been replaced with a more accurate alternative, reducing a mean absolute error to less than 10 years. Notably, our model surpasses the human baseline, outperforming human performance in this task. Additionally, the gender model, which determines the perceived gender of a speaker, has been upgraded to a new three-class model that includes a child class. The introduction of the child class addresses the lack of gender differentiation for children, ensuring a comprehensive analysis of speaker attributes.
Introducing Audio Quality Analysis Module
The latest devAIce® Web API introduces the all-new Audio Quality module, enabling comprehensive analysis of audio signal quality. With this module, users can now perform quality checks to identify any anomalies or issues in the audio input. It is designed to work seamlessly with other modules, ensuring optimal performance and facilitating decisions such as excluding audio snippets with excessive background noise. The module provides two essential output values: Signal to Noise Ratio (SNR), and reverberation time (RT60), allowing precise assessment of audio quality.
Enhanced Events Model: Acoustic Event Detection (AED)
The Events module has undergone a transformation, now renamed to Acoustic Event Detection (AED). Equipped with an advanced model, AED demonstrates higher precision and significantly improved recall in detecting speech and music events. These enhancements ensure the accurate identification of actual positives, bolstering the reliability of the results obtained.
Additional Enhancements and Improvements
In addition to the aforementioned updates, we have implemented several other noteworthy changes and improvements:
- The CLI tool can now also run on older Web API versions, ensuring broader compatibility and convenience.
- Our documentation has undergone several improvements, including the enhancement of descriptions related to loudness and intonation in the Prosody module, enabling clearer understanding and usage.
Stay tuned for more detailed insights on each feature in our upcoming technical blog posts. Unlock the full potential of devAIce® Web API 4.3 with these exciting updates!