How emotionally intelligent AI currently really is

part 1

by Felix Burkhardt, Director Research at audEERING

AI becoming emotionally intelligent is discussed everywhere in the media and industry at the moment, but what is really the current status in the industry? This is the first part of two posts about this topic.

Are we ripe for emotions?

In large parts of the world the border between humans and machines is getting lower to the pervasive Internet, fueled by two trends: firstly ubiquitous computing via smart phone, wearables, glasses and implants and secondly home and vehicle automation via smart speakers, interlinked home components and entertainment systems.
With the vast growth in man-machine communication, the most natural form of communication that men is given comes into focus: speech. But speech is much more than just words: speech is expression of the soul! (if there is any); most of what we express is not defined by the words we use but by the way we say them. As a forgotten Greek actor once boasted: “I can make the audience cry just by reciting the alphabet!”

Ignoring this huge well of information is one of the bigger omissions (among real intelligence) that current so-called AI bots like Siri, Alexa, Cortana or Google Now face. Without it, neither urgency, disinterest, disgust nor irony will be detected and acted upon, all being vital for an interaction that would earn the designation “natural”.
The “emotional channel” was long ignored, I remember a salesman from a large speech technology provider answering to my question about emotional speech synthesis: “We concentrate on stability and leave research to academia” just 15 years ago, but this is changing now, of course also being fueled by the wake of the current AI hype.

Emotional artificial intelligence is a comparatively new field, but there is tremendous motion in the area. Supported by a plethora of newly developed open-source components, modules, libraries and languages to extract acoustic features from audio and feed them into machine learning frameworks, every reasonably able programmer can now throw together a first prototype of an emotion aware dialog machine in about two working days.

Besides many SMEs, all the big companies like Amazon, Microsoft, IBM or Apple already have solutions for emotion recognition from facial mimics analysis in the market and surely have internal development for recognition from speech. Many smaller companies offer services for sentiment detection from text, bio-signal analysis, and audio analysis. But does the technology keep what the marketeers promise?


The applications are manifold: emotion recognition might help with automated market research; a field where already many companies are offering their services as they monitor target groups interacting with a new product while measuring the emotional reaction objectively.
Stress or tiredness detection can help to make traffic more secure, interest or boredom detection are obvious candidates for e-learning software, speaker classification can help adapt automated dialogs like humans would. To mention more fields: automated security observation, believable characters in gaming or teaching software for professional actors like salespeople and politicians come to mind.
A vast field is also given by the health-care and wellbeing domain: monitoring emotional expression might assist me to understand others and myself and aid in therapeutic treatments.
There are even applications already on the market that perhaps are not so obvious, as for example to make people pay per laugh when watching comedies in a cinema.

But there are, as always in life, dangers and drawbacks:
First of all, what should an AI-driven dialog system do with the information about the user’s emotional state?
A system reacting to my emotions seems more intelligent than a dumb toaster that ignores my urgency. But can it stand up to the raised expectations?
I remember, about 12 years ago, when I programmed my first emotional dialogs the weird moment when my very simple if-then-else dialog seemed intelligent – just because I had added an unpredictability layer due to the erroneous detection of my own emotional state.
Symbolic AI, to model the world by a rule based expert system, is to this day only successful on very limited domains, the same goes for systems that are based on machine learning: the world is just too complex than that it could be modeled by an artificial neural network or a support vector machine, besides a very small part of it.
Remember: everything that can happen will happen eventually and some events might be rare, but there’s a really large number of them, so the world is chaotic by nature and eludes models!
A promising way to make the best of both worlds are ontology based machine learning techniques.

Another issue to be conscious about are the ethical consequences of emotion detection technology: there are thousands of definitions of emotions, but most include that emotional expression is something that humans are not conscious of, can not control directly and in many cases don’t want to have advertised. So we have to be very careful how we use these systems, if we don’t want to go yet another step in direction to a world envisaged by George Orwell.

Want to read more? Then continue with the second part of my post on this topic covering pitfalls, scientific benchmarks and conclusions.