Data Security: 6 most FAQs for Simone Hantke

June 17, 2021,

Caro Bauer

Data Security is key when it comes to using artificial intelligence in any kind of product. For this blog post, we interviewed our expert Simone Hantke, Director Data Intelligence at audEERING, on data security. We address the six most frequently asked questions around data security and how we handle sensitive data

1. Simone, you are the Director Data Intelligence here at audEERING. What do you understand by the term data security?

Data security, has in my option, several aspects: internal handling of data, e.g. for HR and handling of external data, e.g. from a client. Let us focus of those aspects, which are relevant to clients here. When we work with client data, we have several rules – among others the following two:
– Internal data storage only: The data is stored on our internal servers and the access is restricted. No data storage in the cloud or on the employee’s laptops.
– Access for authorised persons only: First, the process of delivering the data to us has to be safe. From the delivery to us until the end of a project, it is clear who has access and is allowed to work with the data. Sometimes, there are clients who only give restricted access rights, so we have to set certain rules on our servers. Then, we also have to track who worked with certain data and when

2. What is your position regarding data protection at audEERING?

I am the person who collects topics and ideas as well as provides her knowledge to our research team. For data recordings, my work includes suggestions, for example writing the data policies for data recordings for special apps and then handing it over to our data privacy officer. When it comes to data storage, I have to check the licenses with all the information for usage and coordinate it with several other colleagues.

3. How is this data protection ensured, e.g., regarding the AI SoundLab portal, which involves sensitive voice and health data?

Speech data is always special because you cannot completely anonymize the data. The sound of a person’s voice is so unique, that even without giving out a name or any other personal information, someone could potentially recognize this person.

When it comes to our portal AI SoundLab, we do not only record the voice data, but we also record meta data. The more sensitive the data gets, the higher the standards are, which must be kept regarding data protection and the more difficult it is to store the data. This means, we have to ensure that this data is recorded, delivered and stored very strictly according to European and German data protection laws.

4. Does audEERING have the right to use the collected information for other products?

This really depends on the type of data we are receiving and the license, which is applied. If we are getting data from our clients, in some cases the usage is very restricted. For health data e.g., it can happen, that there are only one or two employees at audEERING at all, who are allowed to process this data. The usage is then also restricted to this special project with the client. Furthermore, we have to delete the data afterwards so there is no way we can use the data for anything else.

In other cases, clients are fine to share data with us and allow us to use the features of the audio files for model building. For example, if we get data from a university, there are some open licenses which allow us to use this data for all different kind of products, research and model building.

5. What can users understand by the term pseudonymously stored user data and what is the difference between anonymously and pseudonymously stored data

Audio data is unique. If you know the person who is speaking you would recognize him or her. This means, audio data can not be completely anonymized, if it is the real raw audio data, which has not been processed, e.g., with a special filter.

At audEERING we pseudonymize the audio data, i.e., we collect meta data connected to the raw audio files. Here is an example how this works: If we only ask for gender or age, there can be several persons in the data pool who have the same gender and age and nobody would really know who this person is. However, if we also ask for nationality, the the data gets less pseudonymous and at some point, it is not even pseudonymous anymore, because you are able to extract from all this data who this person is. So, that is why we try to collect as little data as possible but as much as needed to ensure that it gets as pseudonymous as possible.

6. What are the specifics around data security in the healthcare sector (in general)?

For health data, the laws are much stricter than for other data. If a company would like to create a health product based on this data, certain certificates are needed. So, if a company wants to enter the healthcare sector, there are a lot more barriers. For future medical applications, we are working with partners as we are no medical company ourself. Currently, we are e.g., working with several hospitals on projects.