©2021 audEERING GmbH Soroosh Mashal 18.02.2021
In the past two episodes, we have clarified the definitions, and we have learned one of the common ways to model emotions. In this episode, we want to explain what Machine Learning and Artificial Intelligence are. Also, we show how we use these technologies to create an AI agent.
Machine Learning is teaching computers to use the same rules as humans. Rather than giving them the rules, it is allowing them to find the rules and patterns themselves. Machine Learning is one of the fundamental processes in the creation of Artificial Intelligence (AI), which brings us to the next definition. AI is when computer systems can perform tasks normally requiring human intelligence, such as visual perception, speech, or emotion recognition. A system created using this method is sometimes referred to as “AI Agent”, and our responsibility is to “train” this agent.
In order to do this, computers should see examples to learn. It is similar to the process of raising children. These examples are called data. We gather a lot of data and we manually annotate them. It means, we have a 3-second audio of somebody yelling and a text that says it is 70% angry. And we have a burst of 2-second laughter and a text that says that it is 90% happy. However, we must ensure that we can remain objective. That’s why we let a big group of people with different perceptions (e.g. cultural and gender variations) annotate these pieces of audio. After averaging the result and ensuring that these people agree on something, we feed the computer with them.
We generally break our data into 3 groups:
First, we use the training dataset to train the models. The model is a neural network that has the role of the brain for our AI agent. In this stage we teach our computer to learn. We try different methods like GANs (Generative Adversarial Networks) or CNN (Convolutional Neural Networks) and a variety of algorithms until we find a satisfactory result.
Then, we use the second data set which is usually the biggest to develop on it. This can be compared to the teenage time of a human: the AI gets to try everything for itself. It is some form of a playground of life for our AI. The AI tries thousands of different variables until it reaches a stage that things do not change that much anymore. Now that our teenage AI has grown enough, it is up to us as parents to test him before he can go out into his adult life, and be a useful tool for humans.
That is why we create the “test set”. It is a set of data that our AI has never seen before. Neither in training nor in the development phase. How good it performs on this test, defines how accurate and intelligent our AI has become. Usually, we define this threshold by human level performance. It means we expect our AI to be as closely accurate to human-level perception as possible. For some tasks like Automatic Speech Recognition the ground truth is quite clear (what the person says). We can strive to reach 100%, but it gets tricky for some other tasks like emotion recognition that humans do not agree with each other. For such tasks, we still take objectivity into account and expect AI to perform as close to us as possible.
For example, if I say a sentence and with a relatively joyful voice and you record it and let four random people on this planet listen to it, three might find it joyful and one might find it neutral. Meanwhile, I believe that I was joyful, but only 75% of people perceive me as joyful, and since emotions are something that we express, we have to always take perception into account. In other words, human-level emotional perception, in this case, was 75% and we expect our AI to perform as closely to this level as possible.
Finally, when our AI model is ready, we integrate it into different applications and ship it to our customers. It will be another tool to help me and you doing our tasks more efficiently and enjoy more free time with our loved ones. That is how we train computers to understand human emotions. In the next episode, we will learn more about some of the applications of this technology in today’s world.