Décompositions en Éléments Sonores et Applications Musicales

Mathieu Lagrange Roland Badeau  Bertrand David  Nancy Bertin  Olivier Derrien  Sylvain Marchand  Laurent Daudet 


Institut Télécom, Télécom ParisTech, CNRS LTCI Paris


LMA, CNRS - UPR 7051 Marseille

Université de Bretagne Occidentale, UFR Sciences & Techniques, Brest

Université Paris Diderot, Institut Langevin CNRS UMR 7587 Paris

In this paper is presented the DESAM project which was divided in two parts. The first one was devoted to the theoretical and experimental study of parametric and non-parametric techniques for decomposing audio signals into sound elements. The second part focused onsome musical applications of these decompositions. Most aspects that have been considered in this project have led to the proposal of new methods which have been grouped together into the so-called DESAM Toolbox, a set of Matlab® functions dedicated to the estimation of widely used spectral models for music signals. Although those models can be used in Music Information Retrieval (MIR) tasks, the core functions of the toolbox do not focus on any specific application. It is rather aimed at providing a range of state-of-the-art signal processing tools that decompose music recordings according to different signal models, giving rise to different “mid-level” representations.

Extended Abstract

Analyzing a polyphonic recording, in order to extract or to modify its musical content (e.g. the instruments, the beat, or the notes) is a difficult exercise, even for an experimented musician. The tools described in this paper aim at making a computer able to perform such tasks. Let us mention three of them:

1. Pitch estimation. Estimating the pitch of a sound (on a scale from low to high) is critical for identifying musical notes, but remains difficult in a polyphonic recording, because of the overlap of sounds in the time and frequency domains.

2. Automatic transcription. If producing a sound given a musical score happens to be relatively easy both for the skilled musician and computer, the inverse problem, called “automatic transcription”, which aims at recovering a musical score from a recording, proves to be much more complex and requires expert skills.

3. Audio coding. Storing and transmitting an increasing volume of musical recordings requires the coding of this data in a format that is as compact as possible. This involves a tradeoff between the quantity of coded information, and the quality of the reproduced sound.

In order to perform these tasks, one needs a model for polyphonic music. However, no single model can successfully account for all the characteristics of musical tones in general, and how they are intertwined with one another to form music. Musical notes are primarily characterized by their pitch and their timbre, specific to the instrument. They can thus be modeled as a mixture of sinusoids, whose frequencies and amplitudes are related to the pitch and timbre of the sound. In order to estimate the fine time variations of these two parameters, one needs precise analysis methods, such as the so-called “high-resolution” methods. Besides, since a musical piece is composed of multiple notes played at different times, it is naturally described as a combination of a number of elementary sound elements (which can be either isolated notes, combinations of notes, or parts of notes). Such a representation is called “sparse”, since a very limited number of such sound elements, if well selected, should approximately describe the whole musical content. A complementary framework is based on amathematical tool called “Non-negative Matrix Factorization” (NMF). It exploits the redundancies in a musical piece (a single tone being generally repeated within the piece), in order to identify the sound elements via their spectral characteristics and their various occurrences through time.

Amongst the results of the DESAM project, funded by the French ANR (Agence Nationale de la recherche), we have developed a number of analysis tools:

– an original pitch estimation method, capable of estimating up to ten simultaneous notes, which has been used in an automatic transcription algorithm for piano music.

– another transcription scheme based on NMF, which has been developed for a larger class of instruments.

– a coding method based on high-resolution analysis, which reaches very low bitrates (high compression ratio).

– another coding method, based on sparse decompositions, which is a scalable audio coder which can reach transparency (perceptively, the compressed sound cannot be distinguished from the original one).

Most aspects that have been considered in this project have led to the proposal of new algorithms which have been grouped together into the so-called DESAM Toolbox, a set of Matlab® functions dedicated to the estimation of widely used spectral models for music signals. This paper shortly presents the innovative tools that have been considered in order to build these systems. They are divided into two main parts: the first one is devoted to the theoretical and experimental study of parametric and nonparametric techniques for decomposing audio signals into sound elements; the second part focuses on some musical applications of these decompositions.

Although these models can be used in a wide range of Music Information Retrieval (MIR) tasks, the core functions of the toolbox do not focus on any specific application. Their goal is rather to provide a wide range of state-of-the-art signal processing tools, that decompose music recordings according to different signal models, giving rise to different “mid-level” representations.


Dans cet article sont présentés de manière synthétique les résultats du projet ANR DESAM (Décompositions en éléments sonores et applications musicales). Ce projet comportait deux parties, la première portant sur des avancées théoriques de techniques de décompositions de signaux audionumériques et la seconde traitant d’applications musicales de ces décompositions. La plupart des aspects abordés dans le projet ont donné lieu à de nouvelles méthodes et algorithmes qui sont regroupés au sein d’une boîte à outils, la DESAM Toolbox. Celle-ci rassemble un ensemble de fonctions Matlab® dédiées à l’estimation de modèles spectraux très utilisés pour les signaux musicaux. Les méthodes étudiées dans ce projet peuvent bien sûr être utiles pour la recherche automatique d’informations dans les signaux musicaux, mais elles constituent avant tout une collection d’outils récents pour décomposer les signaux selon différents modèles, avec pour résultat des représentations mi-niveaux variées, pouvant être utiles dans d’autres domaines d’application.


audio processing, spectral models, sound modeling.


traitement du signal audio, modèles spectraux, modélisation du son.

1. Introduction
2. Le Projet DESAM
3. Modèles Sinusoïdaux
4. Modèles Spectraux et Applications
5. Transcription Automatique par Factorisation du Spectrogramme
6. Discussion
7. Conclusion

