MFCC

streaming mode | Spectral category

Inputs

  • spectrum (vector_real) - the audio spectrum

Outputs

  • bands (vector_real) - the energies in mel bands
  • mfcc (vector_real) - the mel frequency cepstrum coefficients

Parameters

  • dctType (integer ∈ [2, 3], default = 2) :
    the DCT type
  • highFrequencyBound (real ∈ (0, ∞), default = 11000) :
    the upper bound of the frequency range [Hz]
  • inputSize (integer ∈ (1, ∞), default = 1025) :
    the size of input spectrum
  • liftering (integer ∈ [0, ∞), default = 0) :
    the liftering coefficient. Use '0' to bypass it
  • logType (string ∈ {natural, dbpow, dbamp, log}, default = dbamp) :
    logarithmic compression type. Use 'dbpow' if working with power and 'dbamp' if working with magnitudes
  • lowFrequencyBound (real ∈ [0, ∞), default = 0) :
    the lower bound of the frequency range [Hz]
  • normalize (string ∈ {unit_sum, unit_max}, default = unit_sum) :
    'unit_max' makes the vertex of all the triangles equal to 1, 'unit_sum' makes the area of all the triangles equal to 1
  • numberBands (integer ∈ [1, ∞), default = 40) :
    the number of mel-bands in the filter
  • numberCoefficients (integer ∈ [1, ∞), default = 13) :
    the number of output mel coefficients
  • sampleRate (real ∈ (0, ∞), default = 44100) :
    the sampling rate of the audio signal [Hz]
  • type (string ∈ {magnitude, power}, default = power) :
    use magnitude or power spectrum
  • warpingFormula (string ∈ {slaneyMel, htkMel}, default = slaneyMel) :
    The scale implementation type. use 'htkMel' to emulate its behaviour. Default slaneyMel.
  • weighting (string ∈ {warping, linear}, default = warping) :
    type of weighting function for determining triangle area

Description

This algorithm computes the mel-frequency cepstrum coefficients of a spectrum. As there is no standard implementation, the MFCC-FB40 is used by default:

  • filterbank of 40 bands from 0 to 11000Hz
  • take the log value of the spectrum energy in each mel band
  • DCT of the 40 bands down to 13 mel coefficients

There is a paper describing various MFCC implementations [1].

The parameters of this algorithm can be configured in order to behave like HTK [3] as follows:

  • type = 'magnitude'
  • warpingFormula = 'htkMel'
  • weighting = 'linear'
  • highFrequencyBound = 8000
  • numberBands = 26
  • numberCoefficients = 13
  • normalize = 'unit_max'
  • dctType = 3
  • logType = 'log'
  • liftering = 22

In order to completely behave like HTK the audio signal has to be scaled by 2^15 before the processing and if the Windowing and FrameCutter algorithms are used they should also be configured as follows.

FrameGenerator:

  • frameSize = 1102
  • hopSize = 441
  • startFromZero = True
  • validFrameThresholdRatio = 1

Windowing:

  • type = 'hamming'
  • size = 1102
  • zeroPadding = 946
  • normalized = False

This algorithm depends on the algorithms MelBands and DCT and therefore inherits their parameter restrictions. An exception is thrown if any of these restrictions are not met. The input "spectrum" is passed to the MelBands algorithm and thus imposes MelBands' input requirements. Exceptions are inherited by MelBands as well as by DCT.

IDCT can be used to compute smoothed Mel Bands. In order to do this:

  • compute MFCC
  • smoothedMelBands = 10^(IDCT(MFCC)/20)

Note: The second step assumes that 'logType' = 'dbamp' was used to compute MFCCs, otherwise that formula should be changed in order to be consistent.

References:

[1] T. Ganchev, N. Fakotakis, and G. Kokkinakis, "Comparative evaluation of various MFCC implementations on the speaker verification task," in International Conference on Speach and Computer (SPECOM’05), 2005, vol. 1, pp. 191–194.

[2] Mel-frequency cepstrum - Wikipedia, the free encyclopedia, http://en.wikipedia.org/wiki/Mel_frequency_cepstral_coefficient

[3] Young, S. J., Evermann, G., Gales, M. J. F., Hain, T., Kershaw, D., Liu, X., … Woodland, P. C. (2009). The HTK Book (for HTK Version 3.4). Construction, (July 2000), 384, https://doi.org/http://htk.eng.cam.ac.uk

Streaming algorithms

AfterMaxToBeforeMaxEnergyRatio | AllPass | AudioLoader | AudioOnsetsMarker | AudioWriter | AutoCorrelation | BFCC | BPF | BandPass | BandReject | BarkBands | BarkExtractor | BeatTrackerDegara | BeatTrackerMultiFeature | Beatogram | BeatsLoudness | BinaryOperator | BinaryOperatorStream | BpmHistogram | BpmHistogramDescriptors | BpmRubato | CartesianToPolar | CentralMoments | Centroid | ChordsDescriptors | ChordsDetection | Chromagram | Clipper | ConstantQ | Crest | CrossCorrelation | CubicSpline | DCRemoval | DCT | Danceability | Decrease | Derivative | DerivativeSFX | Dissonance | DistributionShape | Duration | DynamicComplexity | ERBBands | EasyLoader | EffectiveDuration | Energy | EnergyBand | EnergyBandRatio | Entropy | Envelope | EqloudLoader | EqualLoudness | FFT | FFTC | FadeDetection | FileOutput | Flatness | FlatnessDB | FlatnessSFX | Flux | FrameCutter | FrameToReal | FrequencyBands | GFCC | GeometricMean | HFC | HPCP | HarmonicBpm | HarmonicMask | HarmonicModelAnal | HarmonicPeaks | HighPass | HighResolutionFeatures | HprModelAnal | HpsModelAnal | IDCT | IFFT | IIR | Inharmonicity | InstantPower | Key | KeyExtractor | LPC | Larm | Leq | LevelExtractor | LogAttackTime | LoopBpmConfidence | LoopBpmEstimator | Loudness | LoudnessEBUR128 | LoudnessEBUR128Filter | LoudnessVickers | LowLevelSpectralEqloudExtractor | LowLevelSpectralExtractor | LowPass | MFCC | Magnitude | MaxFilter | MaxMagFreq | MaxToTotal | Mean | Median | MelBands | MetadataReader | Meter | MinToTotal | MonoLoader | MonoMixer | MonoWriter | MovingAverage | MultiPitchMelodia | Multiplexer | NoiseAdder | NoveltyCurve | OddToEvenHarmonicEnergyRatio | OnsetDetection | OnsetDetectionGlobal | OnsetRate | Onsets | OverlapAdd | Panning | PeakDetection | PercivalBpmEstimator | PercivalEnhanceHarmonics | PercivalEvaluatePulseTrains | PitchContours | PitchContoursMelody | PitchContoursMonoMelody | PitchContoursMultiMelody | PitchFilter | PitchMelodia | PitchSalience | PitchSalienceFunction | PitchSalienceFunctionPeaks | PitchYin | PitchYinFFT | PolarToCartesian | PoolAggregator | PowerMean | PowerSpectrum | PredominantPitchMelodia | RMS | RawMoments | RealAccumulator | ReplayGain | Resample | ResampleFFT | RhythmDescriptors | RhythmExtractor | RhythmExtractor2013 | RhythmTransform | RollOff | SBic | Scale | SilenceRate | SineModelAnal | SineModelSynth | SineSubtraction | SingleBeatLoudness | SingleGaussian | Slicer | SpectralCentroidTime | SpectralComplexity | SpectralContrast | SpectralPeaks | SpectralWhitening | Spectrum | SpectrumCQ | SpectrumToCent | Spline | SprModelAnal | SprModelSynth | SpsModelAnal | SpsModelSynth | StartStopSilence | StereoDemuxer | StereoMuxer | StereoTrimmer | StochasticModelAnal | StochasticModelSynth | StrongDecay | StrongPeak | SuperFluxExtractor | SuperFluxNovelty | SuperFluxPeaks | TCToTotal | TempoScaleBands | TempoTap | TempoTapDegara | TempoTapMaxAgreement | TempoTapTicks | TonalExtractor | TriangularBands | TriangularBarkBands | Trimmer | Tristimulus | TuningFrequency | TuningFrequencyExtractor | UnaryOperator | UnaryOperatorStream | Variance | VectorInput | VectorRealAccumulator | Vibrato | WarpedAutoCorrelation | Windowing | ZeroCrossingRate