Melody detection

In this tutorial we will analyse the pitch contour of the predominant melody in an audio recording using the PredominantPitchMelodia algorithm. This algorithm outputs a time series (sequence of values) with the instantaneous pitch value (in Hertz) of the perceived melody. It can be used with both monophonic and polyphonic signals.

# For embedding audio player
import IPython

# Plots
import matplotlib.pyplot as plt
from pylab import plot, show, figure, imshow
plt.rcParams['figure.figsize'] = (15, 6)

import numpy
import essentia.standard as es

audiofile = '../../../test/audio/recorded/flamenco.mp3'

# Load audio file.
# It is recommended to apply equal-loudness filter for PredominantPitchMelodia.
loader = es.EqloudLoader(filename=audiofile, sampleRate=44100)
audio = loader()
print("Duration of the audio sample [sec]:")
print(len(audio)/44100.0)

# Extract the pitch curve
# PitchMelodia takes the entire audio signal as input (no frame-wise processing is required).

pitch_extractor = es.PredominantPitchMelodia(frameSize=2048, hopSize=128)
pitch_values, pitch_confidence = pitch_extractor(audio)

# Pitch is estimated on frames. Compute frame time positions.
pitch_times = numpy.linspace(0.0,len(audio)/44100.0,len(pitch_values) )

# Plot the estimated pitch contour and confidence over time.
f, axarr = plt.subplots(2, sharex=True)
axarr[0].plot(pitch_times, pitch_values)
axarr[0].set_title('estimated pitch [Hz]')
axarr[1].plot(pitch_times, pitch_confidence)
axarr[1].set_title('pitch confidence')
plt.show()
Duration of the audio sample [sec]:
14.22859410430839
_images/tutorial_pitch_melody_2_1.png

The zero pitch value correspond to unvoiced audio segments with a very low pitch confidence according to the algorithm’s estimation. You can force estimations on those as well by setting the guessUnvoiced parameter.

Let’s listen to the estimated pitch and compare it to the original audio. To this end we will generate a sine wave signal following the estimated pitch, using the mir_eval Python package (make sure to install it with pip install mir_eval to be able to run this code).

IPython.display.Audio(audiofile)
from mir_eval.sonify import pitch_contour

from tempfile import TemporaryDirectory
temp_dir = TemporaryDirectory()

# Essentia operates with float32 ndarrays instead of float64, so let's cast it.
synthesized_melody = pitch_contour(pitch_times, pitch_values, 44100).astype(numpy.float32)[:len(audio)]
es.AudioWriter(filename=temp_dir.name + 'flamenco_melody.mp3', format='mp3')(es.StereoMuxer()(audio, synthesized_melody))

IPython.display.Audio(temp_dir.name + 'flamenco_melody.mp3')

Note segmentation and converting to MIDI

The PredominantPitchMelodia algorithm outputs pitch values in Hz, but we can also convert it to MIDI notes using the PitchContourSegmentation algorithm. Here is the default output it provides (tune the parameters for better note estimation).

onsets, durations, notes = es.PitchContourSegmentation(hopSize=128)(pitch_values, audio)
print("MIDI notes:", notes) # Midi pitch number
print("MIDI note onsets:", onsets)
print("MIDI note durations:", durations)
MIDI notes: [63. 64. 65. 65. 64. 63. 63. 64. 62. 60. 55. 60. 60. 61. 63. 61. 60. 59.
 61. 60. 61. 63. 61. 60. 61. 63. 61. 61. 60. 60. 60. 61. 63. 65. 65. 68.
 67. 65. 63. 55. 63. 64. 63. 65. 66. 65. 63. 62. 63. 65. 64. 61. 56.]
MIDI note onsets: [ 0.19156462  0.49052155  0.8562358   1.3960998   1.6457143   1.7502041
  1.8634014   1.9620862   2.1507483   2.2871656   3.5236282   3.822585
  4.2405443   4.4640365   4.571429    4.6701136   4.8297505   4.954558
  5.0532427   5.16644     5.265125    5.3928347   5.4915195   5.642449
  5.976236    6.135873    6.2345576   6.382585    6.4899774   6.588662
  6.7453966   6.8440814   6.986304    7.250431    7.4187756   7.5174603
  7.616145    9.00644     9.13415    10.8379135  10.942404   11.041088
 11.296508   11.395193   11.668027   11.766712   11.894422   12.036644
 12.135329   12.321089   12.419773   12.547483   14.085805  ]
MIDI note durations: [0.29315192 0.14222223 0.5340589  0.13641724 0.09868481 0.10739229
 0.09287982 0.18285714 0.13061224 1.2306576  0.29315192 0.39473924
 0.21768707 0.1015873  0.09287982 0.1538322  0.11900227 0.09287982
 0.10739229 0.09287982 0.12190476 0.09287982 0.14512472 0.14802721
 0.1538322  0.09287982 0.14222223 0.1015873  0.09287982 0.1509297
 0.09287982 0.13641724 0.258322   0.16253968 0.09287982 0.09287982
 1.3844898  0.12190476 0.14222223 0.09868481 0.09287982 0.2496145
 0.09287982 0.26702946 0.09287982 0.12190476 0.13641724 0.09287982
 0.17995465 0.09287982 0.12190476 0.11900227 0.14512472]

We can now export results to a MIDI file. We will use mido Python package (which you can install with pip install mido) to do generate the .mid file. You can test the result using the generated .mid file in a DAW.

import mido

PPQ = 96 # Pulses per quarter note.
BPM = 120 # Assuming a default tempo in Ableton to build a MIDI clip.
tempo = mido.bpm2tempo(BPM) # Microseconds per beat.

# Compute onsets and offsets for all MIDI notes in ticks.
# Relative tick positions start from time 0.
offsets = onsets + durations
silence_durations = list(onsets[1:] - offsets[:-1]) + [0]

mid = mido.MidiFile()
track = mido.MidiTrack()
mid.tracks.append(track)

for note, onset, duration, silence_duration in zip(list(notes), list(onsets), list(durations), silence_durations):
    track.append(mido.Message('note_on', note=int(note), velocity=64,
                              time=int(mido.second2tick(duration, PPQ, tempo))))
    track.append(mido.Message('note_off', note=int(note),
                              time=int(mido.second2tick(silence_duration, PPQ, tempo))))

midi_file = temp_dir.name + '/extracted_melody.mid'
mid.save(midi_file)
print("MIDI file location:", midi_file)
MIDI file location: /tmp/tmpf6zzwakg/extracted_melody.mid