standard mode | Pitch category


  • contoursBins (vector_vector_real) - array of frame-wise vectors of cent bin values representing each contour

  • contoursSaliences (vector_vector_real) - array of frame-wise vectors of pitch saliences representing each contour

  • contoursStartTimes (vector_real) - array of the start times of each contour [s]

  • duration (real) - time duration of the input signal [s]


  • pitch (vector_real) - vector of estimated pitch values (i.e., melody) [Hz]

  • pitchConfidence (vector_real) - confidence with which the pitch was detected


  • binResolution (real ∈ (0, ∞), default = 10) :

    salience function bin resolution [cents]

  • filterIterations (integer ∈ [1, ∞), default = 3) :

    number of interations for the octave errors / pitch outlier filtering process

  • guessUnvoiced (bool ∈ {false, true}, default = false) :

    Estimate pitch for non-voiced segments by using non-salient contours when no salient ones are present in a frame

  • hopSize (integer ∈ (0, ∞), default = 128) :

    the hop size with which the pitch salience function was computed

  • maxFrequency (real ∈ [0, ∞), default = 20000) :

    the maximum allowed frequency for salience function peaks (ignore contours with peaks above) [Hz]

  • minFrequency (real ∈ [0, ∞), default = 80) :

    the minimum allowed frequency for salience function peaks (ignore contours with peaks below) [Hz]

  • referenceFrequency (real ∈ (0, ∞), default = 55) :

    the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin

  • sampleRate (real ∈ (0, ∞), default = 44100) :

    the sampling rate of the audio signal (Hz)

  • voiceVibrato (bool ∈ {true, false}, default = false) :

    detect voice vibrato

  • voicingTolerance (real ∈ [-1.0, 1.4], default = 0.2) :

    allowed deviation below the average contour mean salience of all contours (fraction of the standard deviation)


This algorithm converts a set of pitch contours into a sequence of predominant f0 values in Hz by taking the value of the most predominant contour in each frame. This algorithm is intended to receive its “contoursBins”, “contoursSaliences”, and “contoursStartTimes” inputs from the PitchContours algorithm. The “duration” input corresponds to the time duration of the input signal. The output is a vector of estimated pitch values and a vector of confidence values.

Note that “pitchConfidence” can be negative in the case of “guessUnvoiced”=True: the absolute values represent the confidence, negative values correspond to segments for which non-salient contours where selected, zero values correspond to non-voiced segments.

When input vectors differ in size, or “numberFrames” is negative, an exception is thrown. Input vectors must not contain negative start indices nor negative bin and salience values otherwise an exception is thrown.

Recommended processing chain: (see [1]): EqualLoudness -> frame slicing with sample rate = 44100, frame size = 2048, hop size = 128 -> Windowing with Hann, x4 zero padding -> Spectrum -> SpectralPeaks -> PitchSalienceFunction -> PitchSalienceFunctionPeaks -> PitchContours.


[1] J. Salamon and E. Gómez, “Melody extraction from polyphonic music signals using pitch contour characteristics,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 6, pp. 1759–1770, 2012.

Source code

See also

EqualLoudness (standard) EqualLoudness (streaming) Loudness (standard) Loudness (streaming) PitchContours (standard) PitchContours (streaming) PitchContoursMelody (streaming) PitchSalience (standard) PitchSalience (streaming) PitchSalienceFunction (standard) PitchSalienceFunction (streaming) PitchSalienceFunctionPeaks (standard) PitchSalienceFunctionPeaks (streaming) SpectralPeaks (standard) SpectralPeaks (streaming) Spectrum (standard) Spectrum (streaming) Windowing (standard) Windowing (streaming)