Essentia Python tutorial

This is a hands-on tutorial for complete newcomers to Essentia. Essentia combines the power of computation speed of the main C++ code with the Python environment, making fast prototyping and scientific research very easy.

To follow this tutorial (and various Python examples we provide) interactively, we provide Jupyter Python notebooks. They are located in the src/examples/python folder in the source code. The notebook for this tutorial is essentia_python_tutorial.ipynb (link). If you are not familiar with Python notebooks, read how to use them here.

You should have the NumPy package installed for computations with vectors and matrices in Python and Matplotlib for plotting. Other recommended packages for scientific computing not used in this tutorial but often used with Essentia are SciPy, scikit-learn, pandas, and seaborn for visualization.

The big strength of Essentia is its extensive collection of optimized and tested algorithms for audio processing and analysis, all conveniently available within the same library. You can parametrize and re-use these algorithms for specific use-cases and your custom analysis pipelines. For more details on the algorithms, see the algorithms overview and the complete algorithms reference.

Using Essentia in standard mode

There are two modes of using Essentia, standard and streaming, and in this section, we will focus on the standard mode. See next section for the streaming mode.

We will have a look at some basic functionality:

  • how to load an audio

  • how to perform some numerical operations such as FFT

  • how to plot results

  • how to output results to a file

Exploring the Python module

Let’s investigate a bit the Essentia package.

# first, we need to import our essentia module. It is aptly named 'essentia'!
import essentia

# there are two operating modes in essentia which (mostly) have the same algorithms
# they are accessible via two submodules:
import essentia.standard
import essentia.streaming

# let's have a look at what is in there

# you can also do it by using autocompletion in Jupyter/IPython, typing "essentia.standard." and pressing Tab
['AfterMaxToBeforeMaxEnergyRatio', 'AllPass', 'AudioLoader', 'AudioOnsetsMarker', 'AudioWriter', 'AutoCorrelation', 'BFCC', 'BPF', 'BandPass', 'BandReject', 'BarkBands', 'BeatTrackerDegara', 'BeatTrackerMultiFeature', 'Beatogram', 'BeatsLoudness', 'BinaryOperator', 'BinaryOperatorStream', 'BpmHistogram', 'BpmHistogramDescriptors', 'BpmRubato', 'CartesianToPolar', 'CentralMoments', 'Centroid', 'ChordsDescriptors', 'ChordsDetection', 'ChordsDetectionBeats', 'Chromagram', 'Clipper', 'ConstantQ', 'Crest', 'CrossCorrelation', 'CubicSpline', 'DCRemoval', 'DCT', 'Danceability', 'Decrease', 'Derivative', 'DerivativeSFX', 'Dissonance', 'DistributionShape', 'Duration', 'DynamicComplexity', 'ERBBands', 'EasyLoader', 'EffectiveDuration', 'Energy', 'EnergyBand', 'EnergyBandRatio', 'Entropy', 'Envelope', 'EqloudLoader', 'EqualLoudness', 'Extractor', 'FFT', 'FFTC', 'FadeDetection', 'Flatness', 'FlatnessDB', 'FlatnessSFX', 'Flux', 'FrameCutter', 'FrameGenerator', 'FrameToReal', 'FreesoundExtractor', 'FrequencyBands', 'GFCC', 'GeometricMean', 'HFC', 'HPCP', 'HarmonicBpm', 'HarmonicMask', 'HarmonicModelAnal', 'HarmonicPeaks', 'HighPass', 'HighResolutionFeatures', 'HprModelAnal', 'HpsModelAnal', 'IDCT', 'IFFT', 'IIR', 'Inharmonicity', 'InstantPower', 'Intensity', 'Key', 'KeyExtractor', 'LPC', 'Larm', 'Leq', 'LevelExtractor', 'LogAttackTime', 'LoopBpmConfidence', 'LoopBpmEstimator', 'Loudness', 'LoudnessEBUR128', 'LoudnessVickers', 'LowLevelSpectralEqloudExtractor', 'LowLevelSpectralExtractor', 'LowPass', 'MFCC', 'Magnitude', 'MaxFilter', 'MaxMagFreq', 'MaxToTotal', 'Mean', 'Median', 'MelBands', 'MetadataReader', 'Meter', 'MinToTotal', 'MonoLoader', 'MonoMixer', 'MonoWriter', 'MovingAverage', 'MultiPitchKlapuri', 'MultiPitchMelodia', 'Multiplexer', 'MusicExtractor', 'NoiseAdder', 'NoveltyCurve', 'NoveltyCurveFixedBpmEstimator', 'OddToEvenHarmonicEnergyRatio', 'OnsetDetection', 'OnsetDetectionGlobal', 'OnsetRate', 'Onsets', 'OverlapAdd', 'PCA', 'Panning', 'PeakDetection', 'PercivalBpmEstimator', 'PercivalEnhanceHarmonics', 'PercivalEvaluatePulseTrains', 'PitchContourSegmentation', 'PitchContours', 'PitchContoursMelody', 'PitchContoursMonoMelody', 'PitchContoursMultiMelody', 'PitchFilter', 'PitchMelodia', 'PitchSalience', 'PitchSalienceFunction', 'PitchSalienceFunctionPeaks', 'PitchYin', 'PitchYinFFT', 'PolarToCartesian', 'PoolAggregator', 'PowerMean', 'PowerSpectrum', 'PredominantPitchMelodia', 'RMS', 'RawMoments', 'ReplayGain', 'Resample', 'ResampleFFT', 'RhythmDescriptors', 'RhythmExtractor', 'RhythmExtractor2013', 'RhythmTransform', 'RollOff', 'SBic', 'Scale', 'SilenceRate', 'SineModelAnal', 'SineModelSynth', 'SineSubtraction', 'SingleBeatLoudness', 'SingleGaussian', 'Slicer', 'SpectralCentroidTime', 'SpectralComplexity', 'SpectralContrast', 'SpectralPeaks', 'SpectralWhitening', 'Spectrum', 'SpectrumCQ', 'SpectrumToCent', 'Spline', 'SprModelAnal', 'SprModelSynth', 'SpsModelAnal', 'SpsModelSynth', 'StartStopSilence', 'StereoDemuxer', 'StereoMuxer', 'StereoTrimmer', 'StochasticModelAnal', 'StochasticModelSynth', 'StrongDecay', 'StrongPeak', 'SuperFluxExtractor', 'SuperFluxNovelty', 'SuperFluxPeaks', 'TCToTotal', 'TempoScaleBands', 'TempoTap', 'TempoTapDegara', 'TempoTapMaxAgreement', 'TempoTapTicks', 'TonalExtractor', 'TonicIndianArtMusic', 'TriangularBands', 'TriangularBarkBands', 'Trimmer', 'Tristimulus', 'TuningFrequency', 'TuningFrequencyExtractor', 'UnaryOperator', 'UnaryOperatorStream', 'Variance', 'Vibrato', 'WarpedAutoCorrelation', 'Windowing', 'YamlInput', 'YamlOutput', 'ZeroCrossingRate', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_c', '_create_essentia_class', '_create_python_algorithms', '_essentia', '_reloadAlgorithms', '_sys', 'algorithmInfo', 'algorithmNames', 'copy', 'essentia', 'iteritems']

This list contains all Essentia algorithms available in standard mode. You can have an inline help for the algorithms you are interested in using help command (you can also see it by typing MFCC in Jupyter/IPython). You can also use our online algorithm reference.

Help on class Algo in module essentia.standard:

class Algo(Algorithm)
 |  MFCC
 |  Inputs:
 |    [vector_real] spectrum - the audio spectrum
 |  Outputs:
 |    [vector_real] bands - the energies in mel bands
 |    [vector_real] mfcc - the mel frequency cepstrum coefficients
 |  Parameters:
 |    dctType:
 |      integer ∈ [2,3] (default = 2)
 |      the DCT type
 |    highFrequencyBound:
 |      real ∈ (0,inf) (default = 11000)
 |      the upper bound of the frequency range [Hz]
 |    inputSize:
 |      integer ∈ (1,inf) (default = 1025)
 |      the size of input spectrum
 |    liftering:
 |      integer ∈ [0,inf) (default = 0)
 |      the liftering coefficient. Use '0' to bypass it
 |    logType:
 |      string ∈ {natural,dbpow,dbamp,log} (default = "dbamp")
 |      logarithmic compression type. Use 'dbpow' if working with power and 'dbamp'
 |      if working with magnitudes
 |    lowFrequencyBound:
 |      real ∈ [0,inf) (default = 0)
 |      the lower bound of the frequency range [Hz]
 |    normalize:
 |      string ∈ {unit_sum,unit_max} (default = "unit_sum")
 |      'unit_max' makes the vertex of all the triangles equal to 1, 'unit_sum'
 |      makes the area of all the triangles equal to 1
 |    numberBands:
 |      integer ∈ [1,inf) (default = 40)
 |      the number of mel-bands in the filter
 |    numberCoefficients:
 |      integer ∈ [1,inf) (default = 13)
 |      the number of output mel coefficients
 |    sampleRate:
 |      real ∈ (0,inf) (default = 44100)
 |      the sampling rate of the audio signal [Hz]
 |    type:
 |      string ∈ {magnitude,power} (default = "power")
 |      use magnitude or power spectrum
 |    warpingFormula:
 |      string ∈ {slaneyMel,htkMel} (default = "slaneyMel")
 |      The scale implementation type. use 'htkMel' to emulate its behaviour.
 |      Default slaneyMel.
 |    weighting:
 |      string ∈ {warping,linear} (default = "warping")
 |      type of weighting function for determining triangle area
 |  Description:
 |    This algorithm computes the mel-frequency cepstrum coefficients of a
 |    spectrum. As there is no standard implementation, the MFCC-FB40 is used by
 |    default:
 |      - filterbank of 40 bands from 0 to 11000Hz
 |      - take the log value of the spectrum energy in each mel band
 |      - DCT of the 40 bands down to 13 mel coefficients
 |    There is a paper describing various MFCC implementations [1].
 |    The parameters of this algorithm can be configured in order to behave like
 |    HTK [3] as follows:
 |      - type = 'magnitude'
 |      - warpingFormula = 'htkMel'
 |      - weighting = 'linear'
 |      - highFrequencyBound = 8000
 |      - numberBands = 26
 |      - numberCoefficients = 13
 |      - normalize = 'unit_max'
 |      - dctType = 3
 |      - logType = 'log'
 |      - liftering = 22
 |    In order to completely behave like HTK the audio signal has to be scaled by
 |    2^15 before the processing and if the Windowing and FrameCutter algorithms
 |    are used they should also be configured as follows.
 |    FrameGenerator:
 |      - frameSize = 1102
 |      - hopSize = 441
 |      - startFromZero = True
 |      - validFrameThresholdRatio = 1
 |    Windowing:
 |      - type = 'hamming'
 |      - size = 1102
 |      - zeroPadding = 946
 |      - normalized = False
 |    This algorithm depends on the algorithms MelBands and DCT and therefore
 |    inherits their parameter restrictions. An exception is thrown if any of these
 |    restrictions are not met. The input "spectrum" is passed to the MelBands
 |    algorithm and thus imposes MelBands' input requirements. Exceptions are
 |    inherited by MelBands as well as by DCT.
 |    IDCT can be used to compute smoothed Mel Bands. In order to do this:
 |     - compute MFCC
 |    - smoothedMelBands = 10^(IDCT(MFCC)/20)
 |    Note: The second step assumes that 'logType' = 'dbamp' was used to compute
 |    MFCCs, otherwise that formula should be changed in order to be consistent.
 |    References:
 |      [1] T. Ganchev, N. Fakotakis, and G. Kokkinakis, "Comparative evaluation
 |      of various MFCC implementations on the speaker verification task," in
 |      International Conference on Speach and Computer (SPECOM’05), 2005,
 |      vol. 1, pp. 191–194.
 |      [2] Mel-frequency cepstrum - Wikipedia, the free encyclopedia,
 |      [3] Young, S. J., Evermann, G., Gales, M. J. F., Hain, T., Kershaw, D.,
 |      Liu, X., … Woodland, P. C. (2009). The HTK Book (for HTK Version 3.4).
 |      Construction, (July 2000), 384,
 |  Method resolution order:
 |      Algo
 |      Algorithm
 |      builtins.object
 |  Methods defined here:
 |  __call__(self, *args)
 |  __init__(self, **kwargs)
 |  __str__(self)
 |  compute(self, *args)
 |  configure(self, **kwargs)
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  __weakref__
 |      list of weak references to the object (if defined)
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |  __struct__ = {'category': 'Spectral', 'description': 'This algorithm c...
 |  ----------------------------------------------------------------------
 |  Methods inherited from Algorithm:
 |  __compute__(...)
 |      compute the algorithm
 |  __configure__(...)
 |      Configure the algorithm
 |  __new__(*args, **kwargs) from builtins.type
 |      Create and return a new object.  See help(type) for accurate signature.
 |  getDoc(...)
 |      Returns the doc string for the algorithm
 |  getStruct(...)
 |      Returns the doc struct for the algorithm
 |  inputNames(...)
 |      Returns the names of the inputs of the algorithm.
 |  inputType(...)
 |      Returns the type of the input given by its name
 |  name(...)
 |      Returns the name of the algorithm.
 |  outputNames(...)
 |      Returns the names of the outputs of the algorithm.
 |  paramType(...)
 |      Returns the type of the parameter given by its name
 |  paramValue(...)
 |      Returns the value of the parameter or None if not yet configured
 |  parameterNames(...)
 |      Returns the names of the parameters for this algorithm.
 |  reset(...)
 |      Reset the algorithm to its initial state (if any).

Instantiating our first algorithm, loading some audio

Before you can use algorithms in Essentia, you first need to instantiate (create) them. When doing so, you can give them parameters which they may need to work properly, such as the filename of the audio file in the case of an audio loader.

Once you have instantiated an algorithm, nothing has happened yet, but your algorithm is ready to be used and works like a function, that is, you have to call it to make stuff happen (technically, it is a function object).

Essentia has a selection of audio loaders:

  • AudioLoader: the most generic one, returns the audio samples, sampling rate and number of channels, and some other related information

  • MonoLoader: returns audio, down-mixed and resampled to a given sampling rate

  • EasyLoader: a MonoLoader which can optionally trim start/end slices and rescale according to a ReplayGain value

  • EqloudLoader: an EasyLoader that applies an equal-loudness filtering to the audio

# we start by instantiating the audio loader:
loader = essentia.standard.MonoLoader(filename='../../../test/audio/recorded/dubstep.wav')

# and then we actually perform the loading:
audio = loader()
# This is how the audio we want to process sounds like
import IPython