Transparent AI Part 3: Machine Learning and AI Processes

Soroosh Mashal

In the past two episodes, we clarified the definition, and we learned one of the common ways to model emotions. In this episode, we want to see what are machine learning and artificial intelligence and how can we use them to create an AI agent.

Teaching computers using the same rules as humans is called machine learning. Rather than giving them the rules, allowing them to find the rules and patterns themselves. Machine learning is one of the fundamental processes in the creation of Artificial Intelligence (AI), which brings us to the next definition.

Transparency in Technology

AI is when computer systems can perform tasks normally requiring human intelligence, such as visual perception, speech, or emotion recognition. A system created using this method is sometimes referred to as “AI Agent”, and our responsibility is to “train” this agent. In order to do this, computers should see examples to learn. The same process happens when we train our children. These examples are called data. We gather a lot of data and we manually annotate them. It means, we have a 3-second audio yelling and a text that says it’s 70% angry. And we have a burst of 2-second laughter and a text that says that 90% happy. However, we must ensure that we can remain objective. That’s why we let a big group of people with different perceptions (e.g. cultural and gender variations) annotate these pieces of audio and after averaging the result and ensuring that these people agree on something, we feed the computer with them.

Data as the Algorithm’s Food

We generally break our data into 3 groups:

  1. training data
  2. development data
  3. test data

First, we use the training dataset to train the models. The model is a neural network that has the role of the brain of our AI agent. It’s the stage that we teach our baby computer. We try different methods like GANs (Generative Adversarial Networks) or CNN (Convolutional Neural Networks) and a variety of algorithms until we find a satisfactory result. Then, we use the second data set which is usually the biggest to develop on it. It’s the teenage time that the AI gets to try everything for himself. It’s some form of a playground of life for our AI. The AI tries thousands of different variables until it reaches a stage that things don’t change that much anymore.

Now that our teenage AI has grown enough, it’s up to us as parents to test him before he can go out into his adult life, and be a useful tool for humans.


That’s why we create the “test set”. It’s a set of data that our AI has never seen before. Neither in training nor the development phase. How good he performs on this test, defines how accurate and intelligent our AI has become. Usually, we define this threshold by Human-Level Performance. It means we expect our AI to be as closely accurate to human-level perception as possible. For some tasks like Automatic Speech Recognition that we have the ground truth pretty clear (What the person says), we can strive to reach 100%, but it gets tricky for some other tasks like Emotion Recognition that humans don’t agree with each other.

For such tasks, we still take objectivity into account and expect AI to perform as close to us as possible. If I say a sentence with a relatively joyful voice and you record it and let 4 random people on this planet listen to it, 3 might find it joyful and 1 might find it neutral. Meanwhile, I believe that I was joyful, but only 75% of people perceive me as joyful, and since emotions are something that we express, we have to always take perception into account. In other words, human-level emotional perception, in this case, was 75% and we expect our AI to perform as close to this level as possible.

That‘s how computers can be trained to understand human emotions. In the next episode, we will go over some of the applications of this technology in today’s world.


By loading the video, you agree to YouTube’s privacy policy.
Learn more

Load video