The Munich Versatile and Fast Open-Source Audio Feature Extractor
Florian Eyben, Felix Weninger, Martin Woellmer, Bjoern Schuller
The openSMILE feature extration tool enables you to extract large audio feature spaces in realtime. It combines features from Music Information Retrieval and Speech Processing. SMILE is an acronym forSpeech & Music Interpretation by Large-space Extraction. It is written in C++ and is available as both a standalone commandline executable as well as a dynamic library. The main features of openSMILE are its capability of on-line incremental processing and its modularity. Feature extractor components can be freely interconnected to create new and custom features, all via a simple configuration file. New components can be added to openSMILE via an easy binary plugin interface and a comprehensive API.
openSMILE is not only applied in commercial products but also in many scientific publications and academic projects. It is a widely used feature extraction and pattern recognition tool which is applied for a large variety of different usecases (for more details, visit the “References” section).
The key features of the openSMILE toolkit are:
- Cross-platform (Windows, Linux, Mac, Android)
- Fast and efficient incremental processing in real-time
- High modularity and reusability of components
- Plugin support
- Multi-threading support for parallel feature extraction
- Audio I/O:
- PCM WAVE file reader/writer
- Live sound recording and playback via the PortAudio library.
- Acoustic echo cancellation for full duplex recording/playback in an open-microphone setting (via the Speex codec library)
- General audio signal processing:
- Windowing Functions (Hamming, Hann, Gauss, Sine, …)
- Fast-Fourier Transform
- Pre-emphasis filter
- FIR filterbanks
- Extraction of speech-related features, e.g.:
- Signal energy
- Voice quality (Jitter, Shimmer)
- Line Spectral Pairs (LSP)
- Spectral Shape descriptors
- Music-related features:
- Pitch classes (semitone spectrum)
- CHROMA and CENS features
- Weighted differential
- Moving average smoothing of feature contours
- Moving average mean subtraction and variance normalisation (e.g. for on-line cepstral mean subtraction)
- On-line histogram equalisation
- Delta Regression coefficients of arbitrary order
- Statistical functionals (feature summaries), e.g.:
- Means, Extremes
- Linear and quadratic regression
- DCT coefficients
- Modulation-spectrum (new)
- Popular I/O file formats are supported:
- Fully HTK compatible MFCC, PLP, (log-)energy, and delta regression coefficient computation
- Fast: 27k features can be extracted with an RTF of 0.08
For a complete description of openSMILE’s features please read the openSMILE book. It contains a detailed and most up-to-date description of all the components that are currently included in the toolkit.
Another good resource is the on-line help of the commandline feature extractor. Type SMILExtract -H to get help on using it.
- The openSMILE project was started at Technische Universität München (TUM) in 2008 in the scope of the SEMAINE EU-funded research project by Florian Eyben, Martin Wöllmer, and Björn Schuller. The goal of SEMAINE was to design an automated virtual agent with affective and social skills. openSMILE served the purpose of a real-time speech and emotion analyser component in this system. In the final SEMAINE release, version 1.0.1 of openSMILE is used.
- In 2009, the first open-source Emotion and Affect Recognition toolkit (openEAR) based on the openSMILE feature extractor and framework was published.
- In 2010, version 1.0.1 of openSMILE was published and presented at the ACM-MM open-source software challenge – winning an honorable mention.
- Since 2011, openSMILE was further developed by Florian Eyben and Felix Weninger, during their PhD thesis work at Technische Universität München, Germany. Major contributions were made by Erik Marchi for the ASC-Inclusion EU project.
- In 2013 audEERING acquired the rights to the code-base from TUM, and version 2.0 (release candidate) was released under an open-source research license.
- In 2014, audEERING started hosting the openSMILE website with the release of the 2.1 version.
openSMILE is maintained by audEERING since 2013. Version 2.0 and above are distributed free of charge for research and personal use under the terms of the openSMILE research only open-source license.
For commercial use, we provide individualised, flexible commercial licensing options for any project size and budget. We also offer ready-to-use speech analysis services and software products based on our proprietary extensions to the openSMILE core. Expert technical support is also available to help you get started and integrate openSMILE in your developments quickly. Contact us today to receive your customized offer and talk about your possibilities!
Also see our products page to learn about proprietary extensions to openSMILE, such as advanced signal processing and new acoustic features, networking support and distributed processing, android client/server integration, pre-trained models, and intelligent voice activity detection.
The brand new release of openSMILE 2.3 is now available (Oct. 28th 2016). It can be downloaded here as tar.gz and here as .zip. Version 2.3 includes Android JNI integration, an updated configuration file interface, a batch feature extraction GUI for Windows, improved backwards compatibility, an updated version of the ComParE 2013-2015 baseline acoustic parameter set, as well as several bugfixes and performance improvements.
The preview release for openSMILE 2.2 is now available (Oct. 2nd 2015). This is the first release which contains the configuration files for the first release of the Geneva Minimalistic Acoustic Parameter Set (GeMAPS). Binaries for Linux (statically linked, no Portaudio support) are included with the package. Builds for windows and android will follow in the final release. Download: openSMILE-2.2rc1 (as tar.gz file) and here as zip file.
The current stable version of openSMILE is 2.1 (released Dec. 23rd 2014). You can download the full package, including source, binaries for Windows, Linux, and Android here as tar.gz for Unix/Linux/Mac and here as .zip for Windows (Note that both packages have the same content, only the compressed format is different).
The previous release is openSMILE-2.0-rc1, available here.
The latest version of openSMILE is version 2.1 – to be released Dec 5th 2014. We will provide pre-compiled binaries:
- For Windows (Win32): openSMILE-2.1pre1-win32-bin.zip (available soon)
- For Linux (x64, statically linked): openSMILE-2.1pre1-linux_x64-bin.tar.gz (available soon)
The source code will also available for download. Linux build files (automake) and Windows (Visual Studio 2010) [soon] build files are included and supported.
Older releases are provided here only for archival purposes and older projects. These are not supported any more. The 2.x series should be used in new developments and research.
Installation and Documentation
openSMILE’s architecture and usage is well documented in the openSMILE book (available electronically as PDF). The book is included with every release in the doc/ folder.
For version 2.1, we have published an additional tutorial in ACM SIGMM records.
Detailed and extensive theoretical descriptions of the implemented algorithms and concepts can be found in Florian Eyben’s doctoral thesis “Real-time Speech and Music Classification by Large Audio Feature Space Extraction” available at Springer. This is a must-have book for everyone who works with openSMILE and wants to get more insight into the theoretical descriptions of the feature extraction algorithms.
Full Installation and usage instructions are provided in the book. Here are quick-Install instructions for the impatient:
- Run from a binary release: look for a suitable SMILExtract* binary (linux) or SMILExtract*.exe (Windows) in the bin/ subdirectory of the release package and run it from the command line with the -h option to see an on-line help.
- Build the core from source on Linux: run the script buildStandalone.sh (requires automake and autoconf and build-essentials – gcc, g++, make, libtool – to be installed),
- Build the core from source on Windows in Visual Studio (2010 or higher with conversion of project files): open the openSMILE.sln solutions in the folder ide/vs10 and build the full solution (you might have to build several times because Visual Studio does not automatically get all the dependencies between the projects in the solutions).
- Compiling with Portaudio support: Please read the openSMILE book
Known-issues & FAQ
- I get an error message that openSMILE cannot read a WAVE file (Bogus RIFF header), what is wrong with my files? There is a possibility that your wave files are corrupted. In most cases, however, the wave files just contain some extended header information (WAVExt), which openSMILE cannot (yet) read.
- In most configuration files the upper frequency bound of the Mel filterbank is hard-coded to 8000 Hz. Does this mean that these configuration files implicitly assume 16 kHz sample rate? No, the frequency range of the Mel-filterbank is configured independet of the sample rate. For a 44.1 kHz input, the filterbank will still only span the frequency range from 20-8000 Hz, just zeroing or ignoring all bins above 8 kHz. The minimum sampling rate that the input files should have with a filterbank configured for 20-8000 Hz is, however, 16 kHz. For lower sampling rates the filterbank will have a lower frequency range, chaning the frequency assignments of the bands, which is not desired.
If you use openSMILE in your research, please cite the following paper for version 2.x and above:
Florian Eyben, Felix Weninger, Florian Gross, Björn Schuller: “Recent Developments in openSMILE, the Munich Open-Source Multimedia Feature Extractor”, In Proc. ACM Multimedia (MM), Barcelona, Spain, ACM, ISBN 978-1-4503-2404-5, pp. 835-838, October 2013. doi:10.1145/2502081.2502224
For older work based on openSMILE version 1.0.1 and below, you may cite this paper:
Florian Eyben, Martin Wöllmer, Björn Schuller: “openSMILE – The Munich Versatile and Fast Open-Source Audio Feature Extractor”, In Proc. ACM Multimedia (MM), ACM, Florence, Italy, ACM, ISBN 978-1-60558-933-6, pp. 1459-1462, October 2010. doi:10.1145/1873951.1874246
We are always happy to hear what people are using openSMILE for. Thus, we would appreciate it, if you would send us a brief note with a reference to your paper, and/or a brief description of your work.
Help and Support
Be sure to check the FAQ and Known-issues section if you have problems with running or installing openSMILE. Be sure to read and understand the documentation in the openSMILE book before you contact us for support.
If you cannot find an answer to your problem in any of these resources or you have found a bug, please contact Florian Eyben via e-mail (fe at audeering . com). If you need commercial support for openSMILE, please contact us at info at audeering.com and a include a brief description of your project. We will get back to you shortly.
openSMILE’s development has received funding from the European Community’s Seventh Framework Programme (FP7/2007-2013) under grant agreement No. 211486 (SEMAINE) in years 2008-2010.