standard mode | Tonal category


  • logSpectrogram (vector_vector_real) - log spectrum frames

  • meanTuning (vector_real) - mean tuning frames

  • localTuning (vector_real) - local tuning frames


  • tunedLogfreqSpectrum (vector_vector_real) - Log frequency spectrum after tuning

  • semitoneSpectrum (vector_vector_real) - a spectral representation with one bin per semitone

  • bassChromagram (vector_vector_real) - a 12-dimensional chromagram, restricted to the bass range

  • chromagram (vector_vector_real) - a 12-dimensional chromagram, restricted with mid-range emphasis


  • chromaNormalization (string ∈ {none, maximum, L1, L2}, default = none) :

    determines whether or how the chromagrams are normalised

  • frameSize (integer ∈ (1, ∞), default = 1025) :

    the input frame size of the spectrum vector

  • sampleRate (real ∈ (0, ∞), default = 44100) :

    the input sample rate

  • spectralShape (real ∈ (0.5, 0.9), default = 0.7) :

    the shape of the notes in the NNLS dictionary

  • spectralWhitening (real ∈ [0, 1.0], default = 1) :

    determines how much the log-frequency spectrum is whitened

  • tuningMode (string ∈ {global, local}, default = global) :

    local uses a local average for tuning, global uses all audio frames. Local tuning is only advisable when the tuning is likely to change over the audio

  • useNNLS (bool ∈ {true, false}, default = true) :

    toggle between NNLS approximate transcription and linear spectral mapping


This algorithm extracts treble and bass chromagrams from a sequence of log-frequency spectrum frames. On this representation, two processing steps are performed:

-tuning, after which each centre bin (i.e. bin 2, 5, 8, …) corresponds to a semitone, even if the tuning of the piece deviates from 440 Hz standard pitch. -running standardisation: subtraction of the running mean, division by the running standard deviation. This has a spectral whitening effect.

This code is ported from NNLS Chroma [1, 2]. To achieve similar results follow this processing chain: frame slicing with sample rate = 44100, frame size = 16384, hop size = 2048 -> Windowing with Hann and no normalization -> Spectrum -> LogSpectrum.


[1] Mauch, M., & Dixon, S. (2010, August). Approximate Note Transcription for the Improved Identification of Difficult Chords. In ISMIR (pp. 135-140). [2] Chordino and NNLS Chroma, http://www.isophonics.net/nnls-chroma

Source code

See also

LogSpectrum (standard) LogSpectrum (streaming) NNLSChroma (streaming) Spectrum (standard) Spectrum (streaming) Windowing (standard) Windowing (streaming)