Logo Logo
Help
Contact
Switch language to German
Sound source coding in the azimuthal plane: separating sounds via short-term interaural time difference estimations
Sound source coding in the azimuthal plane: separating sounds via short-term interaural time difference estimations
The interaural time difference (ITD) is the main cue to perform sound localization for low-frequency sounds (below ~2kHz) in the azimuthal plane. The extractors for this cue are neurons of two nuclei of the mammalian auditory brainstem, the medial superior olive (MSO) and the low-frequency limb of the lateral superior olive (lLSO). The read-out mechanism on a population level is unknown as single neurons show different responses for frequency-varying stimuli. This poses a challenge especially for natural sound stimuli and complex auditory scenes which cover a wide range of frequencies, i.e., they have a very broad spectrum. To find an encoder of ITDs, we have developed so-called effective population models of the human MSO and lLSO. They are effective in the sense that the individual neurons are each identified by their three defining properties which determine their frequency-dependent ITD tuning: the best frequency (BF), the characteristic delay (CD) and the characteristic phase (CP). We have formulated an ITD decoding strategy in the 2d-space spanned by the membrane potentials of lLSO vs. MSO. From each hemisphere, a separate ITD can be decoded. These two estimations can be weighted and balanced accordingly to retrieve the location of sound sources in the horizontal plane. To this end, we make use of so-called short-term ITDs which are successive estimates in small time windows. Our results indicate that sound localization can be performed correctly in time windows as short as up to 1ms. To perform sound separation of stimuli within complex auditory scenes, we fit Gaussian Mixture Models to the short-term ITD estimate distributions. The results show that sound separation can be performed reliably when the long-term ITD estimation (which is a distribution of short-term ITDs) is made up of a time interval that is larger than 1s. Furthermore, we conclude that sounds can be separated and reconstructed from complex auditory scenes solely based on one auditory cue, the ITD.
MSO, LSO, Interaural Time Difference, Sound Localization, Sound Separation
Groß, Sebastian
2021
English
Universitätsbibliothek der Ludwig-Maximilians-Universität München
Groß, Sebastian (2021): Sound source coding in the azimuthal plane: separating sounds via short-term interaural time difference estimations. Dissertation, LMU München: Graduate School of Systemic Neurosciences (GSN)
[thumbnail of Gross_Sebastian.pdf]
Preview
PDF
Gross_Sebastian.pdf

11MB

Abstract

The interaural time difference (ITD) is the main cue to perform sound localization for low-frequency sounds (below ~2kHz) in the azimuthal plane. The extractors for this cue are neurons of two nuclei of the mammalian auditory brainstem, the medial superior olive (MSO) and the low-frequency limb of the lateral superior olive (lLSO). The read-out mechanism on a population level is unknown as single neurons show different responses for frequency-varying stimuli. This poses a challenge especially for natural sound stimuli and complex auditory scenes which cover a wide range of frequencies, i.e., they have a very broad spectrum. To find an encoder of ITDs, we have developed so-called effective population models of the human MSO and lLSO. They are effective in the sense that the individual neurons are each identified by their three defining properties which determine their frequency-dependent ITD tuning: the best frequency (BF), the characteristic delay (CD) and the characteristic phase (CP). We have formulated an ITD decoding strategy in the 2d-space spanned by the membrane potentials of lLSO vs. MSO. From each hemisphere, a separate ITD can be decoded. These two estimations can be weighted and balanced accordingly to retrieve the location of sound sources in the horizontal plane. To this end, we make use of so-called short-term ITDs which are successive estimates in small time windows. Our results indicate that sound localization can be performed correctly in time windows as short as up to 1ms. To perform sound separation of stimuli within complex auditory scenes, we fit Gaussian Mixture Models to the short-term ITD estimate distributions. The results show that sound separation can be performed reliably when the long-term ITD estimation (which is a distribution of short-term ITDs) is made up of a time interval that is larger than 1s. Furthermore, we conclude that sounds can be separated and reconstructed from complex auditory scenes solely based on one auditory cue, the ITD.