Essentia Python tutorial

This is a hands-on tutorial for complete newcomers to Essentia. Essentia combines the power of computation speed of the main C++ code with the Python environment which makes fast prototyping and scientific research very easy.

First and foremost, if you are a newbie to Python, we recommend you to use Ipython interactive shell or Jupyter python notebooks instead of the standard python interpreter. If you are familiar with python notebooks, you may want to use one created for this tutorial for a more interactive experience. It can be found in the src/examples/tutorial folder in the essentia_python_tutorial.ipynb file. Read how to use python notebooks here.

You should have the NumPy package installed, which gives Python the ability to work with vectors and matrices in pretty much the same way as Matlab. You can also install SciPy, which provides functionality similar to Matlab’s toolboxes, although we won’t be using it in this tutorial. You should have the matplotlib package installed if you want to be able to do some plotting. Other recommended packages include scikit-learn for data analysis and machine learning and seaborn for visualization.

The big strength of Essentia is in its considerably large collection of algorithms for audio processing and analysis which have been optimized and tested and which you can rely on to build your own signal analysis. That is, often you do not have to chase around lots of toolboxes to be able to achieve what you want. For more details on the algorithms, have a look either at the algorithms overview or at the complete reference.

Using Essentia in standard mode

In this section we will focus on how to use Essentia in the standard mode (think Matlab). There is another section that you can read afterwards about using the streaming mode.

We will have a look at some basic functionality:

  • how to load an audio
  • how to perform some numerical operations such as FFT
  • how to plot results
  • how to output results to a file

Note: all the following commands need to be typed in a python interpreter. You can use IPython (start it with the --pylab option to have interactive plots) or Jupyter notebooks.

Exploring the python module

Let’s investigate a bit the Essentia package.

# first, we need to import our essentia module. It is aptly named 'essentia'!
import essentia

# as there are 2 operating modes in essentia which have the same algorithms,
# these latter are dispatched into 2 submodules:
import essentia.standard
import essentia.streaming

# let's have a look at what is in there
print(dir(essentia.standard))

# you can also do it by using autocompletion in IPython, typing "essentia.standard." and pressing Tab
['AfterMaxToBeforeMaxEnergyRatio', 'AllPass', 'AudioLoader', 'AudioOnsetsMarker', 'AudioWriter', 'AutoCorrelation', 'BFCC', 'BPF', 'BandPass', 'BandReject', 'BarkBands', 'BeatTrackerDegara', 'BeatTrackerMultiFeature', 'Beatogram', 'BeatsLoudness', 'BinaryOperator', 'BinaryOperatorStream', 'BpmHistogram', 'BpmHistogramDescriptors', 'BpmRubato', 'CartesianToPolar', 'CentralMoments', 'Centroid', 'ChordsDescriptors', 'ChordsDetection', 'ChordsDetectionBeats', 'Chromagram', 'Clipper', 'ConstantQ', 'Crest', 'CrossCorrelation', 'CubicSpline', 'DCRemoval', 'DCT', 'Danceability', 'Decrease', 'Derivative', 'DerivativeSFX', 'Dissonance', 'DistributionShape', 'Duration', 'DynamicComplexity', 'ERBBands', 'EasyLoader', 'EffectiveDuration', 'Energy', 'EnergyBand', 'EnergyBandRatio', 'Entropy', 'Envelope', 'EqloudLoader', 'EqualLoudness', 'Extractor', 'FFT', 'FFTC', 'FadeDetection', 'Flatness', 'FlatnessDB', 'FlatnessSFX', 'Flux', 'FrameCutter', 'FrameGenerator', 'FrameToReal', 'FreesoundExtractor', 'FrequencyBands', 'GFCC', 'GeometricMean', 'HFC', 'HPCP', 'HarmonicBpm', 'HarmonicMask', 'HarmonicModelAnal', 'HarmonicPeaks', 'HighPass', 'HighResolutionFeatures', 'HprModelAnal', 'HpsModelAnal', 'IDCT', 'IFFT', 'IIR', 'Inharmonicity', 'InstantPower', 'Intensity', 'Key', 'KeyExtractor', 'LPC', 'Larm', 'Leq', 'LevelExtractor', 'LogAttackTime', 'LoopBpmConfidence', 'LoopBpmEstimator', 'Loudness', 'LoudnessEBUR128', 'LoudnessVickers', 'LowLevelSpectralEqloudExtractor', 'LowLevelSpectralExtractor', 'LowPass', 'MFCC', 'Magnitude', 'MaxFilter', 'MaxMagFreq', 'MaxToTotal', 'Mean', 'Median', 'MelBands', 'MetadataReader', 'Meter', 'MinToTotal', 'MonoLoader', 'MonoMixer', 'MonoWriter', 'MovingAverage', 'MultiPitchKlapuri', 'MultiPitchMelodia', 'Multiplexer', 'MusicExtractor', 'NoiseAdder', 'NoveltyCurve', 'NoveltyCurveFixedBpmEstimator', 'OddToEvenHarmonicEnergyRatio', 'OnsetDetection', 'OnsetDetectionGlobal', 'OnsetRate', 'Onsets', 'OverlapAdd', 'PCA', 'Panning', 'PeakDetection', 'PercivalBpmEstimator', 'PercivalEnhanceHarmonics', 'PercivalEvaluatePulseTrains', 'PitchContourSegmentation', 'PitchContours', 'PitchContoursMelody', 'PitchContoursMonoMelody', 'PitchContoursMultiMelody', 'PitchFilter', 'PitchMelodia', 'PitchSalience', 'PitchSalienceFunction', 'PitchSalienceFunctionPeaks', 'PitchYin', 'PitchYinFFT', 'PolarToCartesian', 'PoolAggregator', 'PowerMean', 'PowerSpectrum', 'PredominantPitchMelodia', 'RMS', 'RawMoments', 'ReplayGain', 'Resample', 'ResampleFFT', 'RhythmDescriptors', 'RhythmExtractor', 'RhythmExtractor2013', 'RhythmTransform', 'RollOff', 'SBic', 'Scale', 'SilenceRate', 'SineModelAnal', 'SineModelSynth', 'SineSubtraction', 'SingleBeatLoudness', 'SingleGaussian', 'Slicer', 'SpectralCentroidTime', 'SpectralComplexity', 'SpectralContrast', 'SpectralPeaks', 'SpectralWhitening', 'Spectrum', 'SpectrumCQ', 'SpectrumToCent', 'Spline', 'SprModelAnal', 'SprModelSynth', 'SpsModelAnal', 'SpsModelSynth', 'StartStopSilence', 'StereoDemuxer', 'StereoMuxer', 'StereoTrimmer', 'StochasticModelAnal', 'StochasticModelSynth', 'StrongDecay', 'StrongPeak', 'SuperFluxExtractor', 'SuperFluxNovelty', 'SuperFluxPeaks', 'TCToTotal', 'TempoScaleBands', 'TempoTap', 'TempoTapDegara', 'TempoTapMaxAgreement', 'TempoTapTicks', 'TonalExtractor', 'TonicIndianArtMusic', 'TriangularBands', 'TriangularBarkBands', 'Trimmer', 'Tristimulus', 'TuningFrequency', 'TuningFrequencyExtractor', 'UnaryOperator', 'UnaryOperatorStream', 'Variance', 'Vibrato', 'WarpedAutoCorrelation', 'Windowing', 'YamlInput', 'YamlOutput', 'ZeroCrossingRate', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_c', '_create_essentia_class', '_create_python_algorithms', '_essentia', '_reloadAlgorithms', '_sys', 'algorithmInfo', 'algorithmNames', 'copy', 'essentia', 'iteritems']

This list contains all Essentia algorithms available in standard mode. You can have an inline help for the algorithms you are interested in using help command (you can also see it by typing “MFCC?” in IPython). You can also use our online algorithm reference.

help(essentia.standard.MFCC)
Help on class Algo in module essentia.standard:

class Algo(Algorithm)
 |  MFCC
 |
 |
 |  Inputs:
 |
 |    [vector_real] spectrum - the audio spectrum
 |
 |
 |  Outputs:
 |
 |    [vector_real] bands - the energies in mel bands
 |    [vector_real] mfcc - the mel frequency cepstrum coefficients
 |
 |
 |  Parameters:
 |
 |    dctType:
 |      integer ∈ [2,3] (default = 2)
 |      the DCT type
 |
 |    highFrequencyBound:
 |      real ∈ (0,inf) (default = 11000)
 |      the upper bound of the frequency range [Hz]
 |
 |    inputSize:
 |      integer ∈ (1,inf) (default = 1025)
 |      the size of input spectrum
 |
 |    liftering:
 |      integer ∈ [0,inf) (default = 0)
 |      the liftering coefficient. Use '0' to bypass it
 |
 |    logType:
 |      string ∈ {natural,dbpow,dbamp,log} (default = "dbamp")
 |      logarithmic compression type. Use 'dbpow' if working with power and 'dbamp'
 |      if working with magnitudes
 |
 |    lowFrequencyBound:
 |      real ∈ [0,inf) (default = 0)
 |      the lower bound of the frequency range [Hz]
 |
 |    normalize:
 |      string ∈ {unit_sum,unit_max} (default = "unit_sum")
 |      'unit_max' makes the vertex of all the triangles equal to 1, 'unit_sum'
 |      makes the area of all the triangles equal to 1
 |
 |    numberBands:
 |      integer ∈ [1,inf) (default = 40)
 |      the number of mel-bands in the filter
 |
 |    numberCoefficients:
 |      integer ∈ [1,inf) (default = 13)
 |      the number of output mel coefficients
 |
 |    sampleRate:
 |      real ∈ (0,inf) (default = 44100)
 |      the sampling rate of the audio signal [Hz]
 |
 |    type:
 |      string ∈ {magnitude,power} (default = "power")
 |      use magnitude or power spectrum
 |
 |    warpingFormula:
 |      string ∈ {slaneyMel,htkMel} (default = "slaneyMel")
 |      The scale implementation type. use 'htkMel' to emulate its behaviour.
 |      Default slaneyMel.
 |
 |    weighting:
 |      string ∈ {warping,linear} (default = "warping")
 |      type of weighting function for determining triangle area
 |
 |
 |  Description:
 |
 |    This algorithm computes the mel-frequency cepstrum coefficients of a
 |    spectrum. As there is no standard implementation, the MFCC-FB40 is used by
 |    default:
 |      - filterbank of 40 bands from 0 to 11000Hz
 |      - take the log value of the spectrum energy in each mel band
 |      - DCT of the 40 bands down to 13 mel coefficients
 |    There is a paper describing various MFCC implementations [1].
 |
 |    The parameters of this algorithm can be configured in order to behave like
 |    HTK [3] as follows:
 |      - type = 'magnitude'
 |      - warpingFormula = 'htkMel'
 |      - weighting = 'linear'
 |      - highFrequencyBound = 8000
 |      - numberBands = 26
 |      - numberCoefficients = 13
 |      - normalize = 'unit_max'
 |      - dctType = 3
 |      - logType = 'log'
 |      - liftering = 22
 |
 |    In order to completely behave like HTK the audio signal has to be scaled by
 |    2^15 before the processing and if the Windowing and FrameCutter algorithms
 |    are used they should also be configured as follows.
 |
 |    FrameGenerator:
 |      - frameSize = 1102
 |      - hopSize = 441
 |      - startFromZero = True
 |      - validFrameThresholdRatio = 1
 |
 |    Windowing:
 |      - type = 'hamming'
 |      - size = 1102
 |      - zeroPadding = 946
 |      - normalized = False
 |
 |    This algorithm depends on the algorithms MelBands and DCT and therefore
 |    inherits their parameter restrictions. An exception is thrown if any of these
 |    restrictions are not met. The input "spectrum" is passed to the MelBands
 |    algorithm and thus imposes MelBands' input requirements. Exceptions are
 |    inherited by MelBands as well as by DCT.
 |
 |    IDCT can be used to compute smoothed Mel Bands. In order to do this:
 |     - compute MFCC
 |    - smoothedMelBands = 10^(IDCT(MFCC)/20)
 |
 |    Note: The second step assumes that 'logType' = 'dbamp' was used to compute
 |    MFCCs, otherwise that formula should be changed in order to be consistent.
 |
 |    References:
 |      [1] T. Ganchev, N. Fakotakis, and G. Kokkinakis, "Comparative evaluation
 |      of various MFCC implementations on the speaker verification task," in
 |      International Conference on Speach and Computer (SPECOM’05), 2005,
 |      vol. 1, pp. 191–194.
 |
 |      [2] Mel-frequency cepstrum - Wikipedia, the free encyclopedia,
 |      http://en.wikipedia.org/wiki/Mel_frequency_cepstral_coefficient
 |
 |      [3] Young, S. J., Evermann, G., Gales, M. J. F., Hain, T., Kershaw, D.,
 |      Liu, X., … Woodland, P. C. (2009). The HTK Book (for HTK Version 3.4).
 |      Construction, (July 2000), 384, https://doi.org/http://htk.eng.cam.ac.uk
 |
 |  Method resolution order:
 |      Algo
 |      Algorithm
 |      builtins.object
 |
 |  Methods defined here:
 |
 |  __call__(self, *args)
 |
 |  __init__(self, **kwargs)
 |
 |  __str__(self)
 |
 |  compute(self, *args)
 |
 |  configure(self, **kwargs)
 |
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |
 |  __dict__
 |      dictionary for instance variables (if defined)
 |
 |  __weakref__
 |      list of weak references to the object (if defined)
 |
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |
 |  __struct__ = {'category': 'Spectral', 'description': 'This algorithm c...
 |
 |  ----------------------------------------------------------------------
 |  Methods inherited from Algorithm:
 |
 |  __compute__(...)
 |      compute the algorithm
 |
 |  __configure__(...)
 |      Configure the algorithm
 |
 |  __new__(*args, **kwargs) from builtins.type
 |      Create and return a new object.  See help(type) for accurate signature.
 |
 |  getDoc(...)
 |      Returns the doc string for the algorithm
 |
 |  getStruct(...)
 |      Returns the doc struct for the algorithm
 |
 |  inputNames(...)
 |      Returns the names of the inputs of the algorithm.
 |
 |  inputType(...)
 |      Returns the type of the input given by its name
 |
 |  name(...)
 |      Returns the name of the algorithm.
 |
 |  outputNames(...)
 |      Returns the names of the outputs of the algorithm.
 |
 |  paramType(...)
 |      Returns the type of the parameter given by its name
 |
 |  paramValue(...)
 |      Returns the value of the parameter or None if not yet configured
 |
 |  parameterNames(...)
 |      Returns the names of the parameters for this algorithm.
 |
 |  reset(...)
 |      Reset the algorithm to its initial state (if any).

Instantiating our first algorithm, loading some audio

Before you can use algorithms in Essentia, you first need to instantiate (create) them. When doing so, you can give them parameters which they may need to work properly, such as the filename of the audio file in the case of an audio loader.

Once you have instantiated an algorithm, nothing has happened yet, but your algorithm is ready to be used and works like a function, that is, you have to call it to make stuff happen (technically, it is a function object).

Essentia has a selection of audio loaders:

  • AudioLoader: the most generic one, returns the audio samples, sampling rate and number of channels, and some other related information
  • MonoLoader: returns audio, down-mixed and resampled to a given sampling rate
  • EasyLoader: a MonoLoader which can optionally trim start/end slices and rescale according to a ReplayGain value
  • EqloudLoader: an EasyLoader that applies an equal-loudness filtering to the audio
# we start by instantiating the audio loader:
loader = essentia.standard.MonoLoader(filename='../../../test/audio/recorded/dubstep.wav')

# and then we actually perform the loading:
audio = loader()
# This is how the audio we want to process sounds like
import IPython
IPython.display.Audio('../../../test/audio/recorded/dubstep.wav')