TensorflowPredictVGGish

standard mode | Machine Learning category

Inputs

  • signal (vector_real) - the input audio signal sampled at 16 kHz

Outputs

  • predictions (vector_vector_real) - the predictions

Parameters

  • accumulate (bool ∈ {true, false}, default = false) :
    when true it runs a single Tensorflow session at the end of the stream. Otherwise a session is run for every new patch
  • graphFilename (string) :
    the name of the file containing the model to use
  • input (string, default = model/Placeholder) :
    the name of the input node in the Tensorflow graph
  • isTrainingName (string, default = "") :
    the name of an additional input node indicating whether the model is to be run in a training mode (for models with a training mode, leave it empty otherwise)
  • lastPatchMode (string ∈ {discard, repeat}, default = discard) :
    what to do with the last frames. Options are to repeat them to fill the last patch or to discard them
  • output (string, default = model/Sigmoid) :
    the name of the node from which to retrieve the output tensors
  • patchHopSize (integer ∈ [0, ∞), default = 93) :
    number of frames between the beginnings of adjacent patches. 0 to avoid overlap

Description

This algorithm makes predictions using VGGish-based models. Internally, it uses TensorflowInputVGGish for the input feature extraction (mel bands). It feeds the model with patches of 96 mel bands frames and jumps a constant amount of frames determined by patchHopSize. With the accumulate parameter the patches are stored to run a single TensorFlow session at the end of the stream. This allows to take advantage of parallelization when GPUs are available, but at the same time it can be memory exhausting for long files. The recommended pipeline is as follows:

System Message: ERROR/3 (<stdin>, line 41)

Unexpected indentation.
MonoLoader(sampleRate=16000) >> TensorflowPredictVGGish

Note: This algorithm does not make any check on the input model so it is the user's responsibility to make sure it is a valid one.

References:

[1] Gemmeke, J. et. al., AudioSet: An ontology and human-labelled dataset for audio events, ICASSP 2017

[2] Hershey, S. et. al., CNN Architectures for Large-Scale Audio Classification, ICASSP 2017

[3] Supported models at https://essentia.upf.edu/models/

See also

MonoLoader (standard) MonoLoader (streaming) Scale (standard) Scale (streaming) TensorflowInputVGGish (standard) TensorflowInputVGGish (streaming) TensorflowPredict (standard) TensorflowPredict (streaming) TensorflowPredictVGGish (streaming)

Standard algorithms

AfterMaxToBeforeMaxEnergyRatio | AllPass | AudioLoader | AudioOnsetsMarker | AudioWriter | AutoCorrelation | BFCC | BPF | BandPass | BandReject | BarkBands | BeatTrackerDegara | BeatTrackerMultiFeature | Beatogram | BeatsLoudness | BinaryOperator | BinaryOperatorStream | BpmHistogram | BpmHistogramDescriptors | BpmRubato | CartesianToPolar | CentralMoments | Centroid | ChordsDescriptors | ChordsDetection | ChordsDetectionBeats | ChromaCrossSimilarity | Chromagram | Chromaprinter | ClickDetector | Clipper | ConstantQ | CoverSongSimilarity | Crest | CrossCorrelation | CrossSimilarityMatrix | CubicSpline | DCRemoval | DCT | Danceability | Decrease | Derivative | DerivativeSFX | DiscontinuityDetector | Dissonance | DistributionShape | Duration | DynamicComplexity | ERBBands | EasyLoader | EffectiveDuration | Energy | EnergyBand | EnergyBandRatio | Entropy | Envelope | EqloudLoader | EqualLoudness | Extractor | FFT | FFTC | FadeDetection | FalseStereoDetector | Flatness | FlatnessDB | FlatnessSFX | Flux | FrameCutter | FrameGenerator | FrameToReal | FreesoundExtractor | FrequencyBands | GFCC | GaiaTransform | GapsDetector | GeometricMean | HFC | HPCP | HarmonicBpm | HarmonicMask | HarmonicModelAnal | HarmonicPeaks | HighPass | HighResolutionFeatures | Histogram | HprModelAnal | HpsModelAnal | HumDetector | IDCT | IFFT | IFFTC | IIR | Inharmonicity | InstantPower | Intensity | Key | KeyExtractor | LPC | Larm | Leq | LevelExtractor | LogAttackTime | LogSpectrum | LoopBpmConfidence | LoopBpmEstimator | Loudness | LoudnessEBUR128 | LoudnessVickers | LowLevelSpectralEqloudExtractor | LowLevelSpectralExtractor | LowPass | MFCC | Magnitude | MaxFilter | MaxMagFreq | MaxToTotal | Mean | Median | MedianFilter | MelBands | MetadataReader | Meter | MinMax | MinToTotal | MonoLoader | MonoMixer | MonoWriter | MovingAverage | MultiPitchKlapuri | MultiPitchMelodia | Multiplexer | MusicExtractor | MusicExtractorSVM | NNLSChroma | NSGConstantQ | NSGIConstantQ | NoiseAdder | NoiseBurstDetector | NoveltyCurve | NoveltyCurveFixedBpmEstimator | OddToEvenHarmonicEnergyRatio | OnsetDetection | OnsetDetectionGlobal | OnsetRate | Onsets | OverlapAdd | PCA | Panning | PeakDetection | PercivalBpmEstimator | PercivalEnhanceHarmonics | PercivalEvaluatePulseTrains | PitchContourSegmentation | PitchContours | PitchContoursMelody | PitchContoursMonoMelody | PitchContoursMultiMelody | PitchFilter | PitchMelodia | PitchSalience | PitchSalienceFunction | PitchSalienceFunctionPeaks | PitchYin | PitchYinFFT | PitchYinProbabilistic | PitchYinProbabilities | PitchYinProbabilitiesHMM | PolarToCartesian | PoolAggregator | PowerMean | PowerSpectrum | PredominantPitchMelodia | RMS | RawMoments | ReplayGain | Resample | ResampleFFT | RhythmDescriptors | RhythmExtractor | RhythmExtractor2013 | RhythmTransform | RollOff | SBic | SNR | SaturationDetector | Scale | SilenceRate | SineModelAnal | SineModelSynth | SineSubtraction | SingleBeatLoudness | SingleGaussian | Slicer | SpectralCentroidTime | SpectralComplexity | SpectralContrast | SpectralPeaks | SpectralWhitening | Spectrum | SpectrumCQ | SpectrumToCent | Spline | SprModelAnal | SprModelSynth | SpsModelAnal | SpsModelSynth | StartStopCut | StartStopSilence | StereoDemuxer | StereoMuxer | StereoTrimmer | StochasticModelAnal | StochasticModelSynth | StrongDecay | StrongPeak | SuperFluxExtractor | SuperFluxNovelty | SuperFluxPeaks | TCToTotal | TempoScaleBands | TempoTap | TempoTapDegara | TempoTapMaxAgreement | TempoTapTicks | TensorflowInputMusiCNN | TensorflowInputVGGish | TensorflowPredict | TensorflowPredictMusiCNN | TensorflowPredictVGGish | TonalExtractor | TonicIndianArtMusic | TriangularBands | TriangularBarkBands | Trimmer | Tristimulus | TruePeakDetector | TuningFrequency | TuningFrequencyExtractor | UnaryOperator | UnaryOperatorStream | Variance | Vibrato | Viterbi | WarpedAutoCorrelation | Welch | Windowing | YamlInput | YamlOutput | ZeroCrossingRate