PitchContoursMelody¶
streaming mode | Pitch category
Inputs¶
contoursBins
(vector_vector_real) - array of frame-wise vectors of cent bin values representing each contour
contoursSaliences
(vector_vector_real) - array of frame-wise vectors of pitch saliences representing each contour
contoursStartTimes
(vector_real) - array of the start times of each contour [s]
duration
(real) - time duration of the input signal [s]
Outputs¶
pitch
(vector_real) - vector of estimated pitch values (i.e., melody) [Hz]
pitchConfidence
(vector_real) - confidence with which the pitch was detected
Parameters¶
binResolution
(real ∈ (0, ∞), default = 10) :salience function bin resolution [cents]
filterIterations
(integer ∈ [1, ∞), default = 3) :number of interations for the octave errors / pitch outlier filtering process
guessUnvoiced
(bool ∈ {false, true}, default = false) :Estimate pitch for non-voiced segments by using non-salient contours when no salient ones are present in a frame
hopSize
(integer ∈ (0, ∞), default = 128) :the hop size with which the pitch salience function was computed
maxFrequency
(real ∈ [0, ∞), default = 20000) :the maximum allowed frequency for salience function peaks (ignore contours with peaks above) [Hz]
minFrequency
(real ∈ [0, ∞), default = 80) :the minimum allowed frequency for salience function peaks (ignore contours with peaks below) [Hz]
referenceFrequency
(real ∈ (0, ∞), default = 55) :the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin
sampleRate
(real ∈ (0, ∞), default = 44100) :the sampling rate of the audio signal (Hz)
voiceVibrato
(bool ∈ {true, false}, default = false) :detect voice vibrato
voicingTolerance
(real ∈ [-1.0, 1.4], default = 0.2) :allowed deviation below the average contour mean salience of all contours (fraction of the standard deviation)
Description¶
This algorithm converts a set of pitch contours into a sequence of predominant f0 values in Hz by taking the value of the most predominant contour in each frame. This algorithm is intended to receive its “contoursBins”, “contoursSaliences”, and “contoursStartTimes” inputs from the PitchContours algorithm. The “duration” input corresponds to the time duration of the input signal. The output is a vector of estimated pitch values and a vector of confidence values.
Note that “pitchConfidence” can be negative in the case of “guessUnvoiced”=True: the absolute values represent the confidence, negative values correspond to segments for which non-salient contours where selected, zero values correspond to non-voiced segments.
When input vectors differ in size, or “numberFrames” is negative, an exception is thrown. Input vectors must not contain negative start indices nor negative bin and salience values otherwise an exception is thrown.
Recommended processing chain: (see [1]): EqualLoudness -> frame slicing with sample rate = 44100, frame size = 2048, hop size = 128 -> Windowing with Hann, x4 zero padding -> Spectrum -> SpectralPeaks -> PitchSalienceFunction -> PitchSalienceFunctionPeaks -> PitchContours.
- References:
[1] J. Salamon and E. Gómez, “Melody extraction from polyphonic music signals using pitch contour characteristics,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 6, pp. 1759–1770, 2012.
Source code¶
See also¶
EqualLoudness (standard) EqualLoudness (streaming) Loudness (standard) Loudness (streaming) PitchContours (standard) PitchContours (streaming) PitchContoursMelody (standard) PitchSalience (standard) PitchSalience (streaming) PitchSalienceFunction (standard) PitchSalienceFunction (streaming) PitchSalienceFunctionPeaks (standard) PitchSalienceFunctionPeaks (streaming) SpectralPeaks (standard) SpectralPeaks (streaming) Spectrum (standard) Spectrum (streaming) Windowing (standard) Windowing (streaming)