Your voice has the power to enable natural human-machine interaction.


shutterstock 1427503661

Humanizing Machines
Powered by Voice AI

Make voices as important in human-machine interaction as they are in your daily interactions.
Enabling machines to behave interactively, naturally, and individually is possible with Voice AI.

This results in personalized settings, deep insights into users’ needs, and empathy as a KPI of digitalization.
With Voice AI, the greatest possible added value is extracted from human-machine interaction.

devAIce® is our Audio & Voice AI solution that can be used in numerous use cases:

  • Market Research
  • Robotics & IoT
  • Automotive IoT
  • Healthcare

Software or hardware
Audio analysis for any product

devAIce® analyzes emotional expression, acoustic scenes, and detects many other features from audio. Our AI models perform solidly even with limited CPU power. devAIce® is optimized for low resource consumption. Many models in devAIce® can run in real-time on embedded, low-power ARM devices such as Raspberry Pi and other SoCs.

Apps, Programs & Web Applications​
IoT devices​
Wearables & Smart Devices​
Voice Bots & Conversational AI Tools​
Automotive IoT​
XR-projects with Unity & Unreal plugin​
Previous slide
Next slide

devAIce® the core technology
Several models included

devAIce® comprises a total of 11 modules that can be combined and used depending on the application and context. Take a look at the devAIce® factsheet or scroll through the module overview for more information.

Emotions expressed
through voice

Emotional expression is transmitted through the voice. In our everyday lives, we automatically perform an auditive analysis when we speak to someone. devAIce® focuses on vocal emotional expression and derives emotional dimensions from voice analysis.

emotionCube Dom

The VAD module detects the presence of voice in an audio stream. The detection is strongly robust to noise and independent of the volume level of the voice. This means that also faint voices can be detected in the presence of louder background noises.

Detecting voice in large amounts of audio material leads to a resource-saving and efficient analysis process. If VAD runs before the voice analysis itself, large amounts of non-voice data can be filtered and excluded from the analysis.

The Emotion module performs emotion recognition on voice. The module is designed to detect emotions in all languages. Currently, the module combines two independent emotion models with different output representations:

  • A dimensional arousal-valence-dominance emotion model
  • A four-class categorical emotion model: happy, angry, sad, and neutral

In devAIce® we offer two Emotion modules: Emotion & Emotion Large. 

If working with limited computational resources or running your application on devices with lower memory, the Emotion module is a better fit for your needs, compared to the Emotion Large module, which has a higher prediction accuracy.

The Multi-Modal Emotion module combines acoustics- and linguistics-based emotion recognition in a single module. It achieves higher accuracy than models that are limited to only one of the modalities (e.g. the model provided by the Emotion module). Acoustic models tend to perform better at estimating the arousal dimension of emotions, while linguistic models excel at predicting valence (positivity/negativity). The Multi-Modal Emotion module fuses information from both modalities to improve the prediction accuracy.

The Acoustic Scene Module distinguishes between 3 classes:

  • Indoor 
  • Outdoor
  • Transport


Further subclasses are recognized in each acoustic scene class. This model is currently under development – the specific subclasses will be named in the next update.

The AED module runs acoustic event detection for multiple acoustic event categories on an audio stream.

Currently, speech and music are supported acoustic event categories. The model allows events of different categories to overlap temporally.

The Speaker Attributes module estimates the personal attributes of speakers from voice and speech. These attributes include:

  • A perceived gender (sex), divided into the categories:
    • female adult
    • male adult
    • child
  • A perceived age, in years


devAIce® provides two gender submodules called Gender and Gender (Small).

The age-gender models are trained on self-reported gender.

Evaluate whether the speaker in a recording is the same as a previously-enrolled reference speaker.
Two modes are supported:

  • Enrollment mode: a speaker model for a reference speaker is created or updated based on one or more reference recordings.
  • Verification mode: a previously created speaker model is used to estimate how likely the same speaker is present in a given recording.

*Speaker Verification is currently in development and available in Beta version.

Toggle Content

The Prosody module computes the following prosodic features:

  • F0 (in Hz)
  • Loudness
  • Speaking rate (in syllables per second)
  • Intonation


Running Prosody module in combination with VAD:

  • If the VAD module is disabled, the full audio input is analyzed as a single utterance, and one prosody result is generated.
  • When VAD is enabled, the audio input is first segmented by the VAD before the Prosody module is run on each detected voice segment. In this case, individual prosody results for each segment will be output.

The openSMILE Features module performs feature extraction on speech. Currently, it includes the following two feature sets based on openSMILE:

  • ComParE-2016

This feature set consists of a total of 6373 audio features that are constructed by extracting energy, voicing, and spectral low-level descriptors and computing statistical functionals on them such as percentiles, moments, peaks, and temporal features.

While originally designed for the task of speaker emotion recognition, it has been shown to also work well for a wide range of other audio classification tasks.

  • GeMAPS+

The GeMAPS+ feature set is a proprietary extension of the GeMAPS feature set described in Eyben et al.2. The set consists of a total of 276 audio features and has been designed as a minimalistic, general-purpose set for common analysis and classification tasks on voice.

devaice engageAI

Jabra's Engage AI
Powered by audEERING

Jabra’s Engage AI is the Call Center tool for enhanced conversations between agents and clients. The integration of audEERING’s advanced AI-powered audio analysis technology devAIce® into the call center context provides a new level of agent and customer experience. The tool fits into numerous contexts, always with the goal of improving communication.

MW Research
Voice analysis in Market Research

“Improve product or communication testing with emotional feedback! Our method analyzes the emotional state of your customers during the evaluation. This gives you a comprehensive insight into the emotional user experience.”

MW Research

More about Market Research at audEERING ›

devaice MAFO meet
coop hanson

Hanson Robotics
Robots with Empathy

The cooperation between Hanson Robotics and audEERING also seeks to further develop Sophia Hanson’s social skills. In the future, Sophia will recognize the emotions of the conversation and be able to respond empathically as a result.

In nursing and other fields affected by the shortage of skilled workers, such robots equipped with social AI can help in the future.

More about Robotics at audEERING ›

The Simulation Crew

“Emotions are an essential part of our interactions,” says Eric Jutten, CEO of Dutch company The Simulation Crew.

To ensure that their VR trainer, Iva, was emotionally capable, they found devAIce® XR as a solution. Powered by the XR plugin, they integrated Voice AI into their product. Read the full story.

devaice simulationcrew monitor 1
devAIce™ SDK: 
native on-device

devAIce® SDK:
native on-device

The devAIce® SDK is available for all major desktop, mobile and embedded platforms. It also performs well on devices with low computational resources, like wearables and hearables.

  • Platforms: Windows, Linux, macOS, Android, iOS
  • Processor architectures: x86-64, ARMv8
devAIce™ Web API: cloud-powered, 
native for the web

devAIce® Web API: cloud-powered,
native for the web

devAIce® Web API is the easiest way to integrate audio AI into your web- and cloud-based applications. On-premise deployment options for the highest data security requirements are available. Web API accessible:

  • via command-line CLI tool
  • via included client libraries: Python, .NET, Java, JavaScript, and PHP
  • by directly sending HTTP requests



101 012 vrWhite

devAIce® XR: the Unity & Unreal plugin

devAIce® XR integrates emotions and intelligent audio analysis into virtuality. The plugin is designed to be integrated into your Unity or Unreal project. Don’t miss the moment to include the most important part of interaction: Empathy.

Product Owner Talk
With Milenko Saponja


By loading the video, you agree to YouTube's privacy policy.
Learn more

Load video

Several Use Cases
For Voice AI technology

Market Research

MAFOshutterstock 2154251931

Automotive IoT

MicrosoftTeams image 19

Robotics & IoT

ROBOTshut 1950547285


header health test

Customers, Projects &
Success stories

audEERING reference Hanson Robotics

Hanson Robotics

Hanson AI. Developing Meaningful AI Interactions.
Hanson Robotics LTD. ,
AI & robotics company
reference salesBoost

Sales Boost

Based on the devAIce® Web API, SalesBoost developed its own metric to meet the needs of its customers.
Sales Boost LLC,
audEERING customer jabra GN

Jabra Elite 85h

These headphones offer an optimal sound experience thanks to intelligent acoustic scene analysis.
Smart Headphones,
by Jabra
audEERING customer PlaytestCloud


“With entertAIn observe we were able to massively shorten the analysis time.”
Christian Ress,
audEERING customer the simulation cres

Patient Comfort Simulator

“It helped us add a new layer of emotional interaction to our scenarios in VR.”
Eric Jutten,
audEERING customer emteq labs


“audEERING’s audio intelligence is a powerful and flexible component that can adapt quickly.”
Simon Clarke,
Head of Operations
audEERING customer 11880


The German call center 11880 uses callAIser™ for easier handling of their customers.
Emotional state,
callcenter performance
audEERING customer GFK

Market Builder Voice

“It detects certain aspects of emotions just as well from audio recordings as humans can do it.”
Prof. Dr. Raimund Wildner,
Managing Director & Vice President