openSMILE 3.0 audEERING´s open-source cross-plattform audio feature extractor

Introducing openSMILE 3.0

,
Caro Bauer

We are happy to announce the availability of the next major release 3.0 of openSMILE, audEERING’s open-source, cross-platform audio feature extractor.

With more than 150,000 downloads since its first publication in 2010 and more than 2650 citations in academic papers, openSMILE has become an immensely popular tool among the research community, audio-related companies and individuals. We strongly believe that researchers and enthusiasts should have free and unrestricted access to the fundamental tools that they need for their work and to make new advances in these fields. For this reason, audEERING is continuing to develop and maintain openSMILE by adding new features, fixing bugs, improving compatibility and adding support for new platforms. The new version continues to be made available under the same license as previous versions, i.e. free-of-charge to researchers and individuals for non-commercial use.

openSMILE 3.0 on GitHub

Starting with version 3.0, the source code, binaries and documentation of openSMILE are hosted on GitHub. We are providing pre-built binaries for 64-bit Linux, Windows and OS X (generated using the default build flags, excluding external dependencies such as PortAudio or FFmpeg). If you need any of this additional functionality or want to build for a different platform, you will have to compile openSMILE yourself from source code. Luckily, the build process itself has been completely overhauled in 3.0 and should be straightforward on most systems.

GitHub will also be the new recommended home for users to discuss openSMILE, get help and to report and track issues. It will also allow us to accept community contributions.

What’s new in openSMILE 3.0

Version 3.0 is the culmination of several years of development since the last public release 2.3 of openSMILE. Thus there are too many changes to list all of them in this blog post. In the following, we would like to walk you through some of the most noteworthy improvements. Readers familiar with openSMILE internals may also want to check out the technical changelog with a more detailed list of what’s new.

opensmile Python package

As part of openSMILE 3.0, we are happy to announce the opensmile Python package that makes it incredibly easy to use openSMILE functionality from within Python.

Python has undoubtedly become the language of choice for people working in the machine learning and data science space. The primary intended way of running previous versions of openSMILE was as a standalone command-line application. This meant that people aiming to integrate openSMILE into their Python-based machine learning pipelines had to manually invoke the SMILExtract executable in their scripts and pass data back and forth via audio, CSV or WEKA ARFF files. This approach was often tedious to set up, generally error-prone and in many ways inefficient. The new opensmile Python package provides a much more powerful and efficient way to perform openSMILE feature extraction from within Python. Getting started is as easy as running:

$ pip install opensmile

A separate installation of openSMILE is not required.

The package defines an object-oriented Python API that hides much of the internal complexity of openSMILE and returns a pandas.DataFrame for convenience. Extracting baseline ComParE 2016 acoustic features on a WAV file becomes as simple as:

import opensmile

smile = opensmile.Smile(
feature_set=opensmile.FeatureSet.ComParE_2016,
feature_level=opensmile.FeatureLevel.Functionals,
)

# the result is a pandas.DataFrame containing the features
y = smile.process_file(‘audio.wav‘)

Instead of processing input from a file, you may also pass a numpy.ndarray of raw audio data as input:


# x is a numpy.ndarray containing the audio samples
y = smile.process_signal(x, sampling_rate=16000)

In addition to the ComParE 2016 baseline feature set, the library has built-in support for a number of other config files that ship with openSMILE. Of course, it is also possible to specify your own, custom config file.

For more information on the Python package, check out the documentation on GitHub.

SMILEapi

Under the hood, the Python package is built on top of a new low-level C API called SMILEapi. It provides full programmatic access to all functionality that is already being offered by the command-line SMILExtract tool and in addition offers advanced ways to pass data to and from openSMILE in real-time. While the standalone tool continues to be the recommended way to develop new config files and to perform manual feature extraction via the console, SMILEapi is intended to cover production deployment scenarios where a tighter integration of openSMILE is desired.

openSMILE ships with Python and C# wrappers for SMILEapi to facilitate calling the API from within these languages. More information on SMILEapi can be found in the documentation.

New components

openSMILE 3.0 ships with a number of new components:

  • cDataPrintSink
  • cFunctionalModulation (re-added from version 2.2)
  • cFFmpegSource
  • cExternalSource
  • cExternalAudioSource
  • cExternalSink
  • cExternalMessageInterface
  • cVectorBinaryOperation

cDataPrintSink is primarly meant for debugging purposes and prints out data as text to the standard output or the log file.

The cFFmpegSource component adds an FFmpeg-based audio file source that allows openSMILE to read almost any audio format. In order to use this component, you need to have the FFmpeg libraries installed and build openSMILE from source. In any config file, replace uses of cWaveSource with cFFmpegSource to leverage the new component.

The cExternal* components are specifically for interacting with the SMILEapi and allow passing data back and forth between openSMILE and the host application.

You can find more details about all of these and the other new components in the Components section of the documentation.

New config files

openSMILE 3.0 ships with a minor update of the GeMAPS and eGeMAPS feature sets that fixes a numerical stability issue of certain features. The original version of these sets continues to be available as GeMAPS v01a and eGeMAPS v01a while the new version is named v01b. We recommend to use the new version for all new projects. For backwards-compatibility with trained models, however, you may want to keep using the original version.

Revised build process using CMake

The previous build scripts based on autotools have been modernized and fully rewritten using CMake. The new process makes it easier to customize the build such as setting non-default build flags, linking against external dependencies and using non-default compilers.

iOS support and revised Android integration

In openSMILE 3.0, we are adding support for iOS as second supported mobile platform, in addition to Android for which we introduced support in version 2.2. The openSMILE distribution includes two sample app projects for Android Studio and Xcode that demonstrate how openSMILE can be integrated into mobile apps by means of the SMILEapi. The Android sample included in openSMILE 2.3 has been completely revised for this purpose.

Performance and memory usage improvements

The new release brings a couple of noteworthy performance-related improvements. The largest individual gains stem from optimizations of the
cSpecScale and cFunctionalsModulation components (included in e.g. GeMAPS and ComParE 2016 feature configs). In combination with numerous other performance tweaks across the code base, we see a consistent 40-50% reduction in processing time for commonly-used feature sets between openSMILE 2.3 and 3.0:

Another major improvement has been made in run-time memory usage for data memory levels that store raw audio samples. By reducing the amount of metadata that openSMILE internally stores for these levels, we observe a 80-90% reduction in memory consumption for common feature sets:

HTML documentation

The openSMILE documentation that was previously made available as a PDF file (also known as the “openSMILE book”) is now made available online at https://audeering.github.io/opensmile/. Many parts of the documentation have been revised and updated for this release. Most notably, the documentation now integrates information from the built-in component help system in openSMILE, with descriptions and parameter information for every openSMILE component. This information was previously only accessible through the help command in SMILExtract but is now made browseable and searchable online. It can be found under the Components section in the documentation.

Colorized log output

openSMILE now highlights parts of log messages in color, depending on the log message type:

This may seem like a minor change but can be very helpful when you are looking for warnings or errors in a long, verbose log.

Next Steps

For all current users of openSMILE 2.x, we highly recommend you to update to the new release so you can take advantage of the new features, bugfixes and performance improvements. Version 3.0 is fully backwards-compatible to configurations created for earlier releases, thus updating should be very simple in most of the cases.

Please use the issue tracker on GitHub to provide feedback and let us know if you encounter any technical problems with the new release. For commercial inquiries related to openSMILE licensing or our other products, please contact us at info@audeering.com.