Sistema de Gestão Centralizado em Ambientes Avançado de Produção Multimédia (MOG TIDT-II)

Grupo de investigacao: 
Computação Musical e Sonora
Investigação e Desenvolvimento
Investigador principal: 
Luís Gustavo Martins (55%)
Elementos da equipa (%afetação): 
André Perrotta (MSc - 50%), Álvaro Barbosa (PhD – 20%), Luís Teixeira (PhD – 15%)

Music is built from sound, ultimately resulting from an elaborate interaction between the sound-generating properties of physical objects (i.e. music instruments) and the sound perception abilities of the human auditory system. Humans, even without any kind of formal music training, are typically able to extract, almost unconsciously, a great amount of relevant information from a musical signal. Features such as the beat of a musical piece, the main melody of a complex musical arrangement, the sound sources and events occurring in a complex musical mixture or the song structure are just some examples of the level of knowledge that a naive listener is commonly able to extract just from listening to a musical piece. In order to do so, the human auditory system uses a variety of cues for perceptual grouping such as similarity, proximity, harmonicity, common fate, among others [1].

Typical computational system for sound analysis and Music Information Retrieval (MIR) represent statistically the entire polyphonic or complex sound mixture (e.g. [2, 3]), without any attempt to first identify the different sound entities or events that may coexist in the signal. There is however some evidence that this approach has reached a 'glass ceiling' [4] in terms of analysis and retrieval performance.

The main problem this project addresses is the identification and segregation of sound events in 'real-world' polyphonic music signals (including monaural audio signals). The goal is to individually characterize the different sound events comprising the polyphonic mixture, and use this structured representation to improve the extraction of perceptually relevant information from complex audio and musical mixtures.

The proposed project will follow a Computational Auditory Scene Analysis (CASA) approach for modeling perceptual grouping in music listening [5]. This approach is inspired by the current knowledge of how listeners perceive sound events in music signals, be it music notes, harmonic textures, melodic contours, instruments or other type of event [1], requiring a multidisciplinary approach to the problem [6, pp.14]. Although the demanding challenges faced by such CASA approaches make their performance still quite limited when compared to the human auditory system, some recent results already provide alternative and improved approaches to common sound analysis and MIR applications [T1].

Objetivos do projeto: 

The common purpose of this project is to build upon the research results already obtained by the proposed team, placing it in a good position to articulate knowledge from the different disciplines in order to design, implement and validate innovative methodologies and technologies that are useful for sound and music analysis using computer systems, namely:

  1. an efficient, extensible and open source CASA software framework for modeling perceptual grouping in music listening, which results in a mid-level, structured and perceptually inspired representation of polyphonic music signals,
  2. software technologies for the visualization, sonification, interaction and evaluation of sound events automatically segregated from polyphonic music signals,
  3. evaluation datasets for sound segregation in music signals.

In order to pursue these objectives 7 tasks have been planned and include research work on:

  • sound analysis front-ends, new grouping cues and sequential grouping methods that model the perceptual mechanisms involved in human hearing,
  • new methods for the extraction of descriptors (e.g. pitch, timbre) directly from the mid-level representation of music signals,
  • design, development and optimization of software modules and framework,
  • contributions to new approaches for the evaluation of computational sound analysis and segregation systems.
Atividades e calendario: 

Project start: April 4th 2011
Projecto end: April 3rd 2014

Papers in international journals – 0/3
Communications in international meetings – 4/6
Reports – 2/4
Organization of seminars and conferences – 2/0
PhD theses – 0/1
Models – 1/3
Software – 1/1
Pilor Plants – 0/1
Prototypes – 1/1
Evaluation Datasets – 2/1

Parceiros: INESC Porto (Portugal), University of Victoria (BC, Canada), IRCAM (France), McGill University/CIRMMT (QB, Canada), FEUP (Portugal)
Financiamento: Fundação para a Ciência e Tecnologia (FCT)


A Computational Auditory Scene Analysis Framework for Sound Segregation in Music Signals (CASA-FCT) Music is built from sound, ultimately resulting from an elaborate interaction between the sound-generating properties of physical objects (i.e. music instruments) and the sound perception abilities of the human auditory system.
04/2011 to 04/2014