Algorithm reference - TensorflowPredictCREPE (standard mode)¶
signal(vector_real) - the input audio signal sampled at 16 kHz
predictions(vector_vector_real) - the output values from the model node named after output
batchSize(integer ∈ [-1, ∞), default = -1) :
the batch size for prediction. This allows parallelization when GPUs are available. Set it to -1 to accumulate all the patches and run a single TensorFlow session at the end of the stream
the name of the file containing the model to use
hopSize(real ∈ (0, ∞), default = 10) :
the hop size in milliseconds for running pitch estimations
input(string, default = frames) :
the name of the input node in the TensorFlow graph
output(string, default = model/classifier/Sigmoid) :
the name of the node from which to retrieve the output tensors
This algorithm generates activations of monophonic audio signals using CREPE models.
input and output are the input and output node names in the neural network and are defaulted to the names of the official models. hopSize allows to change the pitch estimation rate. batchSize controls how many pitch timestamps to process in parallel. By default it processes everything at the end of the audio stream, but it can be set to process batches periodically for online applications.
The recommended pipeline is as follows:
MonoLoader(sampleRate=16000) >> TensorflowPredictCREPE()
Notes: This algorithm does not make any check on the input model so it is the user’s responsibility to make sure it is a valid one. The required sample rate of input signal is 16 KHz. Other sample rates will lead to an incorrect behavior.
CREPE: A Convolutional Representation for Pitch Estimation. Jong Wook Kim, Justin Salamon, Peter Li, Juan Pablo Bello. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018.
Original models and code at https://github.com/marl/crepe/
Supported models at https://essentia.upf.edu/models/