Music fingerprinting with Chromaprint¶
Music track identification with AcoustID¶
Chromaprint is an acoustic fingerprinting algorithm based on chroma features. AcoustID is a web service that relies on chromaprints to identify tracks from the MusicBrainz database.
Queries to AcousticID require a client key, the song duration, and the fingerprint. Besides, a query can ask for additional metadata from the MusicBrainz database using the field meta (https://acoustid.org/webservice).
Essentia provides a wrapper algorithm, Chromaprinter, for computing fingerprints with the Chromaprint library. In standard mode, a fingerprint is computed for the entire audio duration. It can then be used to generate a query as in the example below:
# the specified file is not provided with this notebook, try using your own music instead
audio = es.MonoLoader(filename='Desakato-Tiempo de cobardes.mp3', sampleRate=44100)()
fingerprint = es.Chromaprinter()(audio)
client = 'hGU_Gmo7vAY' # This is not a valid key. Use your key.
duration = len(audio) / 44100.
# Composing a query asking for the fields: recordings, releasegroups and compress.
query = 'http://api.acoustid.org/v2/lookup?client=%s&meta=recordings+releasegroups+compress&duration=%i&fingerprint=%s' \
%(client, duration, fingerprint)
from six.moves import urllib
page = urllib.request.urlopen(query)
print(page.read())
{"status": "ok", "results": [{"recordings": [{"artists": [{"id": "b02d0e59-b9c2-4009-9ebd-bfac7ffa0ca3", "name": "Desakato"}], "duration": 129, "releasegroups": [{"type": "Album", "id": "a38dc1ea-74fc-44c2-a31c-783810ba1568", "title": "La Teoru00eda del Fuego"}], "title": "Tiempo de cobardes", "id": "9b91e561-a9bf-415a-957b-33e6130aba76"}], "score": 0.937038, "id": "23495e47-d670-4e86-bd08-b1b24a84f7c7"}]}
Chromaprints can also be computed in real-time using streaming mode of
the algorithm. In this case, a fingerprint is computed each
analysisTime
seconds.
For the use-case shown in the previous example, the fingerprint can be
internally stored until the end of the signal using the concatenate
flag (True by default). In this case, only one chromaprint is returned
after the end of the audio stream.
loader = ess.MonoLoader(filename = 'Music/Desakato-La_Teoria_del_Fuego/01. Desakato-Tiempo de cobardes.mp3')
fps = ess.Chromaprinter(analysisTime=20, concatenate=True)
pool = ess.essentia.Pool()
# Conecting the algorithms
loader.audio >> fps.signal
fps.fingerprint >> (pool, 'chromaprint')
ess.essentia.run(loader)
fp = pool['chromaprint'][0]
To make fingerprints more convenient, they are compressed as char
arrays. We can use the decode_fingerprint
functionality in the
pyacoustid package to get a
numerical representation of the chromaprint and visualize it.
import acoustid as ai
fp_int = ai.chromaprint.decode_fingerprint(fp)[0]
fb_bin = [list('{:032b}'.format(abs(x))) for x in fp_int] # Int to unsigned 32-bit array
arr = np.zeros([len(fb_bin), len(fb_bin[0])])
for i in range(arr.shape[0]):
arr[i,0] = int(fp_int[i] > 0) # The sign is added to the first bit
for j in range(1, arr.shape[1]):
arr[i,j] = float(fb_bin[i][j])
plt.imshow(arr.T, aspect='auto', origin='lower')
plt.title('Binary representation of a Chromaprint ')
Text(0.5,1,u'Binary representation of a Chromaprint ')
Query audio segment identification within an audio track¶
Chromaprints also allow identifying and locating a query music audio segment in a recording. This can be useful in various applications, for instance, song identification in radio/DJ streams.
In this example, we will use a short musical phrase from a Mozart recording (marked by red lines in the audio plot below), low-pass filtered for audio degradation, as a query. We will locate this query within the entire original recording.
The code is an adaptation of this gist by Lukáš Lalinský.
from pylab import plot, show, figure, imshow
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
plt.rcParams['figure.figsize'] = (15, 6) # set plot sizes to something larger than default
fs=44100
audio = es.MonoLoader(filename='../../../test/audio/recorded/mozart_c_major_30sec.wav', sampleRate=fs)() # this song is not available
time = np.linspace(0, len(audio)/float(fs), len(audio))
plot(time, audio)
# Segment limits
start = 19.1
end = 26.3
plt.axvline(x=start, color='red', label='fragment location')
plt.axvline(x=end, color='red')
plt.legend()
<matplotlib.legend.Legend at 0x7f3209427e10>
This is how the track sounds like:
IPython.display.Audio(audio, rate=44100)