LARGE-SCALE CLUSTERING OF CHROMAS


This project deals with clustering of harmonic patterns. The main goals are using large-scale datasets and discovering interesting patterns.

This webpage is a work in progress. Don't hesitate to email me if you don't find all the necessary code here, I'll send it to you.


CODE

All the code in this project is in python. Compared to Matlab, it is more efficient, scalable, free to distribute, etc. Easy to catch up if you don't know anything about it! Suggestion, to test this code, use the application iPython. Our code was mostly developed in python 2.5 with scipy/numpy. Having the scikits library ANN speed sthings up a lot! but is not required. This tutorial may be incomplete, and the code might have bugs! please write me an email (tb2332 @ columbia . edu) if you have any trouble using it.

You will also need an EchoNest API account. It's free, you simply have to register. There is a call limit per minute.

CREATE / GATHER DATA

In this part, you upload songs that you possess to the Echo Nest API and receive thir analysis for it. This analysis is saved as a Matlab file, one per song (yes, python can deal with Matlab files). Matfiles contains per beat analysis, meaning one 12-dimensional vector representing chromas for each of the beat identified by the Echo Nest.

Download: features.py
In iPython:
import features as FEAT
for songpath in allsongpath:
       FEAT.filename_to_beatfeat_mat(songpath,matfilepath)

TRAIN MODEL

From a set of Matfiles (previous section), we will train a codebook using vector quantization.

Download: model.py oracle_matfiles.py initializer.py trainer.py
We assume that all matfiles created above are in some subdirectory of "matdirectory".
The experiments will be saved in subdirs of "./expdir".
We will train a codebook of 100 codewords, each codeword representing 2 bars encoded as 8 beats.
To initialize we create codebook.mat (command line):
python initalizer.py -pSize 8 -usebars 2 -oraclemat matdirectory 100 codebook.mat
To train for 10K iterations = 10K songs randomly shown from matdirectory (command line):
python trainer.py -pSize 8 -usebars 2 -lrate 1e-3 -oraclemat matdirectory -expdir ./expdir codebook.mat

TEST MODEL - ENCODE A SONG

Download encode_song.py
To see the encoding using a codebook and the distortion:
python encode_song.py -pSize 8 -usebars 2 song_enfeats.mat codebook.mat


This project is described in the following paper:

T. Bertin-Mahieux, R. Weiss and D. Ellis, Clustering beat-chroma patterns in a large music database, In Proceedings of the 11th International Conference on Music Information Retrieval (ISMIR), 2010. [pdf]

 


View My Stats