Essentia Python examples

This Jupyter notebook demonstrates various typical examples of using Essentia in Python.

Computing features with MusicExtractor

MusicExtractor is a multi-purpose algorithm for feature extraction from files (see the complete list of computed features here). It combines many algorithms and is also used inside the Essentia’s command-line Music Extractor. For any given input filename, all computed features are stored in two output Pools which you can access for your needs or store to a file. One of the pools contains raw frames data, while another one contains aggregated statistics across frames.

# This is how the audio we want to process sounds like
import IPython
IPython.display.Audio('../../../test/audio/recorded/dubstep.flac')
import essentia
import essentia.standard as es

# Compute all features, aggregate only 'mean' and 'stdev' statistics for all low-level, rhythm and tonal frame features
features, features_frames = es.MusicExtractor(lowlevelStats=['mean', 'stdev'],
                                              rhythmStats=['mean', 'stdev'],
                                              tonalStats=['mean', 'stdev'])('../../../test/audio/recorded/dubstep.flac')

# See all feature names in the pool in a sorted order
print(sorted(features.descriptorNames()))
['lowlevel.average_loudness', 'lowlevel.barkbands.mean', 'lowlevel.barkbands.stdev', 'lowlevel.barkbands_crest.mean', 'lowlevel.barkbands_crest.stdev', 'lowlevel.barkbands_flatness_db.mean', 'lowlevel.barkbands_flatness_db.stdev', 'lowlevel.barkbands_kurtosis.mean', 'lowlevel.barkbands_kurtosis.stdev', 'lowlevel.barkbands_skewness.mean', 'lowlevel.barkbands_skewness.stdev', 'lowlevel.barkbands_spread.mean', 'lowlevel.barkbands_spread.stdev', 'lowlevel.dissonance.mean', 'lowlevel.dissonance.stdev', 'lowlevel.dynamic_complexity', 'lowlevel.erbbands.mean', 'lowlevel.erbbands.stdev', 'lowlevel.erbbands_crest.mean', 'lowlevel.erbbands_crest.stdev', 'lowlevel.erbbands_flatness_db.mean', 'lowlevel.erbbands_flatness_db.stdev', 'lowlevel.erbbands_kurtosis.mean', 'lowlevel.erbbands_kurtosis.stdev', 'lowlevel.erbbands_skewness.mean', 'lowlevel.erbbands_skewness.stdev', 'lowlevel.erbbands_spread.mean', 'lowlevel.erbbands_spread.stdev', 'lowlevel.gfcc.cov', 'lowlevel.gfcc.icov', 'lowlevel.gfcc.mean', 'lowlevel.hfc.mean', 'lowlevel.hfc.stdev', 'lowlevel.loudness_ebu128.integrated', 'lowlevel.loudness_ebu128.loudness_range', 'lowlevel.loudness_ebu128.momentary.mean', 'lowlevel.loudness_ebu128.momentary.stdev', 'lowlevel.loudness_ebu128.short_term.mean', 'lowlevel.loudness_ebu128.short_term.stdev', 'lowlevel.melbands.mean', 'lowlevel.melbands.stdev', 'lowlevel.melbands128.mean', 'lowlevel.melbands128.stdev', 'lowlevel.melbands_crest.mean', 'lowlevel.melbands_crest.stdev', 'lowlevel.melbands_flatness_db.mean', 'lowlevel.melbands_flatness_db.stdev', 'lowlevel.melbands_kurtosis.mean', 'lowlevel.melbands_kurtosis.stdev', 'lowlevel.melbands_skewness.mean', 'lowlevel.melbands_skewness.stdev', 'lowlevel.melbands_spread.mean', 'lowlevel.melbands_spread.stdev', 'lowlevel.mfcc.cov', 'lowlevel.mfcc.icov', 'lowlevel.mfcc.mean', 'lowlevel.pitch_salience.mean', 'lowlevel.pitch_salience.stdev', 'lowlevel.silence_rate_20dB.mean', 'lowlevel.silence_rate_20dB.stdev', 'lowlevel.silence_rate_30dB.mean', 'lowlevel.silence_rate_30dB.stdev', 'lowlevel.silence_rate_60dB.mean', 'lowlevel.silence_rate_60dB.stdev', 'lowlevel.spectral_centroid.mean', 'lowlevel.spectral_centroid.stdev', 'lowlevel.spectral_complexity.mean', 'lowlevel.spectral_complexity.stdev', 'lowlevel.spectral_contrast_coeffs.mean', 'lowlevel.spectral_contrast_coeffs.stdev', 'lowlevel.spectral_contrast_valleys.mean', 'lowlevel.spectral_contrast_valleys.stdev', 'lowlevel.spectral_decrease.mean', 'lowlevel.spectral_decrease.stdev', 'lowlevel.spectral_energy.mean', 'lowlevel.spectral_energy.stdev', 'lowlevel.spectral_energyband_high.mean', 'lowlevel.spectral_energyband_high.stdev', 'lowlevel.spectral_energyband_low.mean', 'lowlevel.spectral_energyband_low.stdev', 'lowlevel.spectral_energyband_middle_high.mean', 'lowlevel.spectral_energyband_middle_high.stdev', 'lowlevel.spectral_energyband_middle_low.mean', 'lowlevel.spectral_energyband_middle_low.stdev', 'lowlevel.spectral_entropy.mean', 'lowlevel.spectral_entropy.stdev', 'lowlevel.spectral_flux.mean', 'lowlevel.spectral_flux.stdev', 'lowlevel.spectral_kurtosis.mean', 'lowlevel.spectral_kurtosis.stdev', 'lowlevel.spectral_rms.mean', 'lowlevel.spectral_rms.stdev', 'lowlevel.spectral_rolloff.mean', 'lowlevel.spectral_rolloff.stdev', 'lowlevel.spectral_skewness.mean', 'lowlevel.spectral_skewness.stdev', 'lowlevel.spectral_spread.mean', 'lowlevel.spectral_spread.stdev', 'lowlevel.spectral_strongpeak.mean', 'lowlevel.spectral_strongpeak.stdev', 'lowlevel.zerocrossingrate.mean', 'lowlevel.zerocrossingrate.stdev', 'metadata.audio_properties.analysis.downmix', 'metadata.audio_properties.analysis.equal_loudness', 'metadata.audio_properties.analysis.length', 'metadata.audio_properties.analysis.sample_rate', 'metadata.audio_properties.analysis.start_time', 'metadata.audio_properties.bit_rate', 'metadata.audio_properties.codec', 'metadata.audio_properties.length', 'metadata.audio_properties.lossless', 'metadata.audio_properties.md5_encoded', 'metadata.audio_properties.number_channels', 'metadata.audio_properties.replay_gain', 'metadata.audio_properties.sample_rate', 'metadata.tags.file_name', 'metadata.version.essentia', 'metadata.version.essentia_git_sha', 'metadata.version.extractor', 'rhythm.beats_count', 'rhythm.beats_loudness.mean', 'rhythm.beats_loudness.stdev', 'rhythm.beats_loudness_band_ratio.mean', 'rhythm.beats_loudness_band_ratio.stdev', 'rhythm.beats_position', 'rhythm.bpm', 'rhythm.bpm_histogram', 'rhythm.bpm_histogram_first_peak_bpm', 'rhythm.bpm_histogram_first_peak_weight', 'rhythm.bpm_histogram_second_peak_bpm', 'rhythm.bpm_histogram_second_peak_spread', 'rhythm.bpm_histogram_second_peak_weight', 'rhythm.danceability', 'rhythm.onset_rate', 'tonal.chords_changes_rate', 'tonal.chords_histogram', 'tonal.chords_key', 'tonal.chords_number_rate', 'tonal.chords_scale', 'tonal.chords_strength.mean', 'tonal.chords_strength.stdev', 'tonal.hpcp.mean', 'tonal.hpcp.stdev', 'tonal.hpcp_crest.mean', 'tonal.hpcp_crest.stdev', 'tonal.hpcp_entropy.mean', 'tonal.hpcp_entropy.stdev', 'tonal.key_edma.key', 'tonal.key_edma.scale', 'tonal.key_edma.strength', 'tonal.key_krumhansl.key', 'tonal.key_krumhansl.scale', 'tonal.key_krumhansl.strength', 'tonal.key_temperley.key', 'tonal.key_temperley.scale', 'tonal.key_temperley.strength', 'tonal.thpcp', 'tonal.tuning_diatonic_strength', 'tonal.tuning_equal_tempered_deviation', 'tonal.tuning_frequency', 'tonal.tuning_nontempered_energy_ratio']

You can then access particular values in the pools:

print("Filename:", features['metadata.tags.file_name'])
print("-"*80)
print("Replay gain:", features['metadata.audio_properties.replay_gain'])
print("EBU128 integrated loudness:", features['lowlevel.loudness_ebu128.integrated'])
print("EBU128 loudness range:", features['lowlevel.loudness_ebu128.loudness_range'])
print("-"*80)
print("MFCC mean:", features['lowlevel.mfcc.mean'])
print("-"*80)
print("BPM:", features['rhythm.bpm'])
print("Beat positions (sec.)", features['rhythm.beats_position'])
print("-"*80)
print("Key/scale estimation (using a profile specifically suited for electronic music):",
      features['tonal.key_edma.key'], features['tonal.key_edma.scale'])
Filename: dubstep.flac
--------------------------------------------------------------------------------
Replay gain: -13.708175659179688
EBU128 integrated loudness: -10.151906967163086
EBU128 loudness range: 0.6067428588867188
--------------------------------------------------------------------------------
MFCC mean: [-670.34771729   84.01703644   24.20178986   -4.55752945    8.39075661
   -1.40908575    6.48751211    9.55739212    1.58020043    9.26296902
   -0.74807823    5.43698931   -4.19750071]
--------------------------------------------------------------------------------
BPM: 139.98114013671875
Beat positions (sec.) [ 0.42956915  0.85913831  1.30031741  1.71827662  2.14784575  2.57741499
  2.9953742   3.42494321  3.86612248  4.29569149  4.72526073  5.15482998
  5.58439922  6.01396799  6.4319272 ]
--------------------------------------------------------------------------------
Key/scale estimation (using a profile specifically suited for electronic music): G# minor

Beat detection and BPM histogram

In this example we are going to look at how to perform beat tracking using RhythmExtractor2013, mark the extractred beats on the audio using the AudioOnsetsMarker algorithm and write those to file using MonoWriter.

from essentia.standard import *

# Loading audio file
audio = MonoLoader(filename='../../../test/audio/recorded/dubstep.flac')()

# Compute beat positions and BPM
rhythm_extractor = RhythmExtractor2013(method="multifeature")
bpm, beats, beats_confidence, _, beats_intervals = rhythm_extractor(audio)

print("BPM:", bpm)
print("Beat positions (sec.):", beats)
print("Beat estimation confidence:", beats_confidence)

# Mark beat positions on the audio and write it to a file
# Let's use beeps instead of white noise to mark them, as it's more distinctive
marker = AudioOnsetsMarker(onsets=beats, type='beep')
marked_audio = marker(audio)
MonoWriter(filename='audio/dubstep_beats.flac')(marked_audio)
BPM: 139.98114013671875
Beat positions (sec.): [ 0.42956915  0.85913831  1.30031741  1.71827662  2.14784575  2.57741499
  2.9953742   3.42494321  3.86612248  4.29569149  4.72526073  5.15482998
  5.58439922  6.01396799  6.4319272 ]
Beat estimation confidence: 3.9443612098693848

We can now listen to the resulting audio with beats marked by beeps. We can also visualize beat estimations.

import IPython
IPython.display.Audio('audio/dubstep_beats.flac')