[O11] Multimodal Blind Source Separation for Robot Audition
Robots, whether operated or autonomous, have widespread applications in, e.g. manufacturing, transport, earth and space exploration, health care and weaponry. The ability to sense and interact with its environment plays an important role for a robot to mimic certain intelligent behaviour as humans. Audition is one of such indispensible senses that are used by humans and animals to recognise their environment in their daily lives. It is therefore highly desirable for a robot to have hearing abilities, to some extent, as a human does. Unfortunately, despite not being limited to only two sensors, robots are still far from approaching the hearing capabilities that are inherent to the human auditory system. A great deal of research in robot audition has been done in the audio domain. It is known however that rather than using only auditory organs, humans are able to infer the meaning of spoken sentences by reading the movement of mouth and facial muscles. In other words, human speech is inherently bimodal: audio and visual in both production and perception. This project attempts to use both the audio and visual modalities for the problem of source separation of target speech in the presence of multiple competing speech interferences and sound sources in room environments for a robotic system. The ultimate goal of the project is to provide progress towards machine perception of auditory scenes within an un-controlled natural environment based on the combination of visual information for the enhancement of audio based blind source separation algorithms.
Project Supervisor
Wenwu Wang is currently a Lecturer at Centre for Vision Speech and Signal Processing, University of Surrey, where he joined since May 2007. Prior to this, he was a Postdoctoral Research Associate at King's College London (from May 2002 to December 2003) and Cardiff University (from January 2004 to April 2005). He also worked in UK industry, first as a DSP Engineer at Tao Group Ltd (now Antix Labs Ltd) (from May 2005 to August 2006), then as an R&D engineer at Creative Labs (from September 2006 to April 2007). During spring 2008, he has been a visiting scholar at the Perception and Neurodynamics Lab and the Center for Cognitive Science, The Ohio State University. He obtained the PhD degree in April 2002 from Harbin Engineering University, China. His research interests include blind signal processing, audio-visual signal processing, machine learning and perception, and machine audition (listening). He is a member of the IEEE, and belongs to the IEEE Signal Processing, and Circuits and Systems Societies. He has served or currently serves as a reviewer, program committee member, or editor for a number of international journals and conferences.



