NOTE: This is older work from my PhD. I have archived a working version of my code. Please contact me via email if you wish to have a copy as I’m no longer working in this field.

My PhD research was in the field of Audio Signal Processing: tracking acoustic sources (people talking) as they move around the room - based on recordings made from an ad-hoc array of microphones. The approach taken uses (Bayesian) Particle Filters to carry out the tracking.

For this research 12 elements were used surrounding the speaker. By comparing the signals recorded at each microphone it is possible to estimate the relativepath difference travelled by the sournd between the speaker and each of the microphones. Combining all of this information a 2 or 3-D position can be estimated. Tracking was undertaken with a particle filter due to the very noisy range information extracted from the speech signals.

During the course of my PhD I made three specific contributions.

  • Estimating speaker orientation by using the amplitude of GCC peaks to provide cues to the speaker orientation (DAFX2006)
  • Applying the Track Before Detect Method to efficently track a speaker with 1000 particles but low computational needs (WASPAA 2007 & IEEE Trans ASL 2010)
  • Variable dimension particle filters to observe, identify and track 1-3 sound sources entirely probabilistically (HSCMA 2008)

Single Experiment: input on left, output on right One person speaking while moving around a room. Illustrated is the Steered Response Function over the room (4.5m x 4.5m). Peaks represent areas where the speaker is likely to have been. However it is not possible to evaluate this surface online during operation as it requires significant computation. Our work developed ways of ‘sensing’ this surface without evaluating it. Right shows results of our SMC tracking algorithms: The peak represents the location. The concentration of the surface around the peak is an indication of our confidence with the tracking accuracy. Note that the peak is not perfectly stable when the source moves across certain edges - this corresponds to the edges of our Existence Grid cells - a simple modification ought to fix that issue.

Multi-person Experiment: input on left, output on right Multi-person Experiment, Input Video: Two people speaking while moving around the room. Note that the frequency of measurements for any one speaker is much lower than in the Sample 1 case. Again, it is not possible to evaluate this surface online. Right shows results of our SMC tracking algorithms: now tracking both speakers.