Updates to cepstral features (MFCC and GFCC)

Fri, 30/12/2016 - 19:15

Working towards the next Essentia release, we have updated our cepstral features. The updates include:

  • Support for extracting MFCCs 'the htk way' (python example).

    In literature there are two common MFCCs 'standards' differing in some parameters and the mel-scale computation itself: the Slaney way (Auditory toolbox) and the htk way (chapter 5.4 from htk book).

    See a python notebook for a comparison with mfcc extracted with librosa and with htk.

  • Support for inverting the computed MFCCs back to spectral (mel) domain (python example).

    The first MFCC coefficients are standard for describing singing voice timbre. The MFCC feature vector however does not represent the singing voice well visually. Instead, it is a common practice to invert the first 12-15 MFCC coefficients back to mel-bands domain for visualization. We have ported invmelfcc.m as explained here.

  • Support for cent scale.

You can start using these features before the official release by building Essentia from the master branch.