Machine learning models

Essentia includes algorithms for running inference with two types of data-driven machine learning models that can be used for high-level annotation of music audio:

  • TensorFlow models

  • Gaia’s SVM classifiers

We provide various pre-trained models of both types for various music classification tasks (see downloads page).

Using pre-trained TensorFlow models

Most recently, Essentia provides a wrapper for inference with TensorFlow deep learning models. Follow these instructions to install and use Essentia with this wrapper.

We provide many pre-trained TensorFlow models for auto-tagging and music classification that can be used out of the box.

See the demos of these models here.

TODO text with recommendations how to build your own models.

Using pre-trained SVM Gaia models

Essentia includes several pre-trained SVM classifier models for genres, moods, and instrumentation. To use them you need to:

  • Install Gaia2 library (supported on Linux/OSX)

  • Build Essentia with examples and Gaia (--with-examples --with-gaia)

  • Use essentia_streaming_extractor_music and configure it to include classifier models (see detailed documentation)

You can train your own classifier models as described below.

Training and running classifier models in Gaia

To run SVM classification in Essentia you need to prepare a classifier model in Gaia and run the GaiaTransform algorithm configured to use this model. The example of using high-level models can be seen in the code of streaming_music_extractor. Here we discuss the steps to be followed to train classifier models that can be used with this extractor.

  1. Compute music descriptors using streaming_music_extractor for all audio files.

  2. Install Gaia with python bindings.

  3. Prepare JSON groundtruth and filelist files (see examples).
    • Groundtruth file maps identifiers for audio files (they can be paths to audio files or whatever id strings you want to use) to class labels.

    • Filelist file maps these identifiers to the actual paths to the descriptor files for each audio track.

  4. Currently, Gaia does not support loading descriptors in JSON format. As a workaround, you can configure the extractor output to YAML format in Step 1, or run json_to_sig.py conversion script.

  5. Run train_model.py script in Gaia (here) with these groundtruth and filelist files. The script will create the classifier model file.

  6. The model file can now be used by a GaiaTransform algorithm inside streaming_music_extractor.

Alternatively to steps 3-5, you can use a simplified script that trains a model given a folder with sub-folders corresponding to class names and containing descriptor files for these classes.

Note that using a specific classifier model implies that you are expected to give a pool with the same descriptor layout as the one used in training as an input to the GaiaTransform algorithm.

How it works

To train the SVMs Gaia internally uses the LibSVM library. The training script automatically creates an SVM model given a ground-truth dataset using the best combination of parameters for data preprocessing and SVM that it can find in a grid search. Testing all possible combinations the script conducts 5-fold cross-validation for each one of them: The ground-truth dataset is randomly split into train and test sets, the model is trained on the train set and is evaluated on the test set. Results are averaged across 5 folds including the confusion matrix. After all combinations of parameters have been evaluated, the winning combination is selected according to the best accuracy obtained in cross-validation and the final SVM classifier model is trained using all ground-truth data. See the “Cross-validation and Grid-search” section in the practical guide to SVM classification for more details.

The combinations of parameters tested in a grid search by default are mentioned in the code. Users can modify these parameters according to their needs by creating such a classification project file on their own.

The parameters include:

  • SVM kernel type: polynomial or RBF

  • SVM type: currently only C-SVC

  • SVM C and gamma parameters

  • preprocessing type:

  • use all descriptors, no preprocessing

  • use lowlevel.* descriptors only

  • discard energy bands descriptors (*barkbands*, *energyband*, *melbands*, *erbbands*)

  • use all descriptors, normalize values

  • use all descriptors, normalize and gaussianize values

  • number of folds in cross-validation: 5 by default

In the preprocessing stage, the training script loads all descriptor files according to the preprocessing type. Additionally, some descriptors are always ignored, including all metadata* that is the information not directly associated with audio analysis. The *.dmean, *.dvar, *.min, *.max, *.cov descriptors are also ignored, and therefore, currently only means and variances are used for descriptors summarized across frames. Non-numerical descriptors are enumerated (tonal.chords_key, tonal.chords_scale, tonal.key_key, tonal.key_scale).

Note that cross-validation script splits the ground-truth dataset into train and test sets randomly. In the case of music classification tasks, one may want to assure artist/album filtering (that is, no artist/album occurs in the test set if it occurs in train set). The current way to achieve it is to ensure that the whole input dataset contains only one item per artist/album. Alternatively, you can adapt the scripts to suit your needs.

How to train an SVM model with a different set of parameters

Our training script generates a single model retrained on the whole dataset with the best parameters combination from the grid search. However, you may want to generate new models with custom parametrizations. Imagine, for instance, that you need a model that runs on a lighter set of features despite the accuracy drop, or that you believe that a different parameter set can improve results for your particular scenario.

To generate a model given the <project_file> and your chosen <param_file> from the results folder, execute the following python lines:

from gaia2.scripts.classification.retrain_model import retrainModel
retrainModel(project_file, param_file, output_file)

This creates a Gaia model and saves it into <output_file>.

Also, note that the retrain_model can be called as a command-line program.

How to choose a parameter configuration

At the end of the training process, a file called <project_name>.report.csv is created. It provides a ranking in terms of accuracy and normalized accuracy as well as the standard deviation between folds for every set of parameters. By having a look at this file you can get some insights about which parameters to try. You can, for instance, estimate the expected accuracy drop if you decide to go for a configuration with a smaller set of descriptors.