PitchCREPE

streaming mode | Pitch category

Inputs

  • audio (vector_real) - the input audio signal sampled at 16000 Hz

Outputs

  • time (vector_real) - the timestamps on which the pitch was estimated
  • frequency (vector_real) - the predicted pitch values in Hz
  • confidence (vector_real) - the confidence of voice activity, between 0 and 1
  • activations (vector_vector_real) - the raw activation matrix

Parameters

  • batchSize (integer ∈ [-1, ∞), default = 64) :
    the batch size for prediction. This allows parallelization when a GPU are available. Set it to -1 or 0 to accumulate all the patches and run a single TensorFlow session at the end
  • graphFilename (string, default = "") :
    the name of the file from which to load the TensorFlow graph
  • hopSize (real ∈ (0, ∞), default = 10) :
    the hop size in milliseconds for running pitch estimation
  • input (string, default = frames) :
    the name of the input node in the TensorFlow graph
  • output (string, default = model/classifier/Sigmoid) :
    the name of the node from which to retrieve the output tensors
  • savedModel (string, default = "") :
    the name of the TensorFlow SavedModel. Overrides parameter graphFilename

Description

This algorithm estimates pitch of monophonic audio signals using CREPE models.

This algorithm is a wrapper to post-process the activations generated by TensorflowPredictCREPE. time contains the timestamps in which the pitch was estimated. frequency is the vector of pitch estimations in Hz. confidence expresses the confidence in the presence of pitch for each timestamp as value between 0 to 1. activations is a time by sigmoid activations matrix returned by the neural network.

See TensorflowPredictCREPE for details about the rest of parameters. The recommended pipeline is as follows:

MonoLoader(sampleRate=16000) >> PitchCREPE()

Notes: This algorithm does not make any check on the input model so it is the user's responsibility to make sure it is a valid one. The required sample rate of input signal is 16 KHz. Other sample rates will lead to an incorrect behavior.

References:

  1. CREPE: A Convolutional Representation for Pitch Estimation. Jong Wook Kim, Justin Salamon, Peter Li, Juan Pablo Bello. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018.
  2. Original models and code at https://github.com/marl/crepe/
  3. Supported models at https://essentia.upf.edu/models/

See also

MonoLoader (standard) MonoLoader (streaming) PitchCREPE (standard) TensorflowPredict (standard) TensorflowPredict (streaming) TensorflowPredictCREPE (standard) TensorflowPredictCREPE (streaming)

Streaming algorithms

AfterMaxToBeforeMaxEnergyRatio | AllPass | AudioLoader | AudioOnsetsMarker | AudioWriter | AutoCorrelation | BFCC | BPF | BandPass | BandReject | BarkBands | BarkExtractor | BeatTrackerDegara | BeatTrackerMultiFeature | Beatogram | BeatsLoudness | BinaryOperator | BinaryOperatorStream | BpmHistogram | BpmHistogramDescriptors | BpmRubato | CartesianToPolar | CentralMoments | Centroid | ChordsDescriptors | ChordsDetection | ChromaCrossSimilarity | Chromagram | Chromaprinter | ClickDetector | Clipper | ConstantQ | CoverSongSimilarity | Crest | CrossCorrelation | CubicSpline | DCRemoval | DCT | Danceability | Decrease | Derivative | DerivativeSFX | DiscontinuityDetector | Dissonance | DistributionShape | Duration | DynamicComplexity | ERBBands | EasyLoader | EffectiveDuration | Energy | EnergyBand | EnergyBandRatio | Entropy | Envelope | EqloudLoader | EqualLoudness | FFT | FFTC | FadeDetection | FalseStereoDetector | FileOutput | Flatness | FlatnessDB | FlatnessSFX | Flux | FrameCutter | FrameToReal | FrequencyBands | GFCC | GapsDetector | GeometricMean | HFC | HPCP | HarmonicBpm | HarmonicMask | HarmonicModelAnal | HarmonicPeaks | HighPass | HighResolutionFeatures | Histogram | HprModelAnal | HpsModelAnal | HumDetector | IDCT | IFFT | IFFTC | IIR | Inharmonicity | InstantPower | Key | KeyExtractor | LPC | Larm | Leq | LevelExtractor | LogAttackTime | LogSpectrum | LoopBpmConfidence | LoopBpmEstimator | Loudness | LoudnessEBUR128 | LoudnessEBUR128Filter | LoudnessVickers | LowLevelSpectralEqloudExtractor | LowLevelSpectralExtractor | LowPass | MFCC | Magnitude | MaxFilter | MaxMagFreq | MaxToTotal | Mean | Median | MedianFilter | MelBands | MetadataReader | Meter | MinMax | MinToTotal | MonoLoader | MonoMixer | MonoWriter | MovingAverage | MultiPitchMelodia | Multiplexer | NNLSChroma | NSGConstantQ | NSGConstantQStreaming | NSGIConstantQ | NoiseAdder | NoiseBurstDetector | NoveltyCurve | OddToEvenHarmonicEnergyRatio | OnsetDetection | OnsetDetectionGlobal | OnsetRate | Onsets | OverlapAdd | Panning | PeakDetection | PercivalBpmEstimator | PercivalEnhanceHarmonics | PercivalEvaluatePulseTrains | PitchCREPE | PitchContours | PitchContoursMelody | PitchContoursMonoMelody | PitchContoursMultiMelody | PitchFilter | PitchMelodia | PitchSalience | PitchSalienceFunction | PitchSalienceFunctionPeaks | PitchYin | PitchYinFFT | PitchYinProbabilistic | PitchYinProbabilities | PitchYinProbabilitiesHMM | PolarToCartesian | PoolAggregator | PoolToTensor | PowerMean | PowerSpectrum | PredominantPitchMelodia | RMS | RawMoments | RealAccumulator | ReplayGain | Resample | ResampleFFT | RhythmDescriptors | RhythmExtractor | RhythmExtractor2013 | RhythmTransform | RollOff | SBic | SNR | SaturationDetector | Scale | SilenceRate | SineModelAnal | SineModelSynth | SineSubtraction | SingleBeatLoudness | SingleGaussian | Slicer | SpectralCentroidTime | SpectralComplexity | SpectralContrast | SpectralPeaks | SpectralWhitening | Spectrum | SpectrumCQ | SpectrumToCent | Spline | SprModelAnal | SprModelSynth | SpsModelAnal | SpsModelSynth | StartStopCut | StartStopSilence | StereoDemuxer | StereoMuxer | StereoTrimmer | StochasticModelAnal | StochasticModelSynth | StrongDecay | StrongPeak | SuperFluxExtractor | SuperFluxNovelty | SuperFluxPeaks | TCToTotal | TempoCNN | TempoScaleBands | TempoTap | TempoTapDegara | TempoTapMaxAgreement | TempoTapTicks | TensorNormalize | TensorToPool | TensorToVectorReal | TensorTranspose | TensorflowInputFSDSINet | TensorflowInputMusiCNN | TensorflowInputTempoCNN | TensorflowInputVGGish | TensorflowPredict | TensorflowPredict2D | TensorflowPredictCREPE | TensorflowPredictEffnetDiscogs | TensorflowPredictFSDSINet | TensorflowPredictMAEST | TensorflowPredictMusiCNN | TensorflowPredictTempoCNN | TensorflowPredictVGGish | TonalExtractor | TriangularBands | TriangularBarkBands | Trimmer | Tristimulus | TruePeakDetector | TuningFrequency | TuningFrequencyExtractor | UnaryOperator | UnaryOperatorStream | Variance | VectorInput | VectorRealAccumulator | VectorRealToTensor | Vibrato | Viterbi | WarpedAutoCorrelation | Welch | Windowing | ZeroCrossingRate