In our last blog posts, we explained the role of emotions in video games, had an overview of the human-machine interaction and what it meant for the video game industry. We found out that from a technical perspective we have reached almost full freedom in the 3D environment. For example, CAVEs can give you the freedom to navigate naturally in a virtual environment in real-time.
Emotional interaction
The next step is to add another layer with its dimensions and modes to our interaction. If you have ever traveled to a foreign country, where you can’t speak the language of the inhabitants, you surely know what it means to talk to people and hear them without understanding a single word they say. However, you still talk in your native language and use your tone as a tool along with your body language to connect and communicate. That’s the magic of emotional interaction.
Recognition of emotional states through the voice
You don’t have to be a world traveler to know that we as humans connect through an ancient form of language: language of emotions. Although the extent of our expressions is heavily influenced by the environment we are raised in and how we are thought to regulate them, the underlying pattern for major emotions are still the same across all cultures. Our laughter and cry, our anger and excitement are expressed with the same vocal tract similar to ancient homo sapiens 60,000 years ago. Generation after generation, we have been trained to detect these clues in the voice and instantly associate them with emotional states or behavioral patterns.
Thousands of voice samples for each emotion
Can Artificial Intelligence learn that as well? Using deep learning methods, computers are fed with thousands and thousands of voice samples for each emotion. This is the same process that happens in our brains since childhood. It takes a lot of computing power to train the model, but once it’s finished, it can run in real-time on any device. Nowadays, AI can detect general emotions as good as humans can. The applications of this innovative technology are countless, but let us stick to the video game industry and human-computer interaction here.
Emotion AI
Emotion AI enables us to complete the emotional interaction cycle. It means the computers can detect and be aware of the emotional state of the user and react accordingly. Affective computing is the study and development of systems and devices that can recognize, interpret, process, and simulate human affects. It is an interdisciplinary field spanning computer science, psychology, and cognitive science.
The idea of completing this cycle is not a 2020 thing. With each advancement in sensor technology and detection, scientists tried to complete the cycle. The use of heartbeat, sweat on your hands, eye movements and brain waves are some examples of these efforts. Although they all proved to be effective in a lab setting and small scale lunches, they did not change the video game industry. The reason behind it is simple: Scaling and availability.
Audio Emotion AI
Despite the benefits of engagement, natural interaction, and additional immersion, video gamers couldn’t and wouldn’t attach ECG (Electro-Cardiogram) or GSR (Galvanic Skin Response) to themselves before they press the play button. Thus, the idea of integration of affective computing and video games remained in the scientific papers and never made it to the general public.
But what if we can use sensors that are out there already and use a mode of affective computing that can be scaled? This question brings us to Audio Emotion AI. Microphones are everywhere and talking is the most natural mode of interaction for us. Using Audio Emotion AI, we can complete the interaction cycle. This time the sensors are there on any device, the models are light-weight and the affective mode of interaction is natural.
The benefit of integrating emotions
With all barriers removed, the question is, why should we actually do that? What is the benefit of integrating emotions in our daily interactions? And what is the danger of a lack of emotions in those interactions?