Abstract
In the field of speech signal processing, speech
source mixture separation is a known challenge. It is addressed
by finding the closest estimate of the original speech source from
the speech mixture. Source separation solutions can be based on
multiple channels or single channel model. In multiple channels,
multiple speakers and microphones are assumed while in single
channel multiple speakers and a single microphone are assumed.
One of the most widely used algorithms in the single-channel
model is the Ideal Ratio Mask (IRM). Although IRM is efficient,
it has a major drawback; the high memory footprint as it
stores all frequency components of the Short-time Fourier
transform (STFT). This makes it less suitable for embedded
applications. We propose a solution based on the optimization of
Mel-frequency Cepstrum Coefficient (MFCC) and Non-centroid
K-nearest neighbor (Nk-nn) algorithms that minimizes memory
utilization and achieves high Signal-to-Interference Ratio (SIR).
Our experimental results show that the proposed solution
improves SIR while minimizing memory requirements compared
to the reference IRM.
source mixture separation is a known challenge. It is addressed
by finding the closest estimate of the original speech source from
the speech mixture. Source separation solutions can be based on
multiple channels or single channel model. In multiple channels,
multiple speakers and microphones are assumed while in single
channel multiple speakers and a single microphone are assumed.
One of the most widely used algorithms in the single-channel
model is the Ideal Ratio Mask (IRM). Although IRM is efficient,
it has a major drawback; the high memory footprint as it
stores all frequency components of the Short-time Fourier
transform (STFT). This makes it less suitable for embedded
applications. We propose a solution based on the optimization of
Mel-frequency Cepstrum Coefficient (MFCC) and Non-centroid
K-nearest neighbor (Nk-nn) algorithms that minimizes memory
utilization and achieves high Signal-to-Interference Ratio (SIR).
Our experimental results show that the proposed solution
improves SIR while minimizing memory requirements compared
to the reference IRM.
Original language | English |
---|---|
Title of host publication | IEEE Sensor Array and Multichannel Signal Processing Workshop |
Publisher | IEEE |
Publication status | Accepted/In press - Jun 2022 |
MoE publication type | A4 Article in a conference publication |