Speech Science and Technology in Hearing Aids

Hearing loss is a prevalent issue that affects a significant portion of the population. The statistics reveal that one in six people experience hearing loss, and this number is expected to rise significantly by 2030. The aging population is particularly vulnerable to hearing loss, with one in four people over 65 years old experiencing it. However, the younger generation is also at risk due to exposure to excessive noise, which is becoming increasingly common in our daily lives. With the advent of new technologies and the prevalence of portable music players, young people are exposed to loud music for extended periods, which can cause permanent damage to their hearing.

The consequences of hearing loss are significant, as it is linked to several adverse health outcomes. Hearing loss can lead to social isolation, loneliness, and depression. It can also affect cognitive function and lead to dementia, particularly in older adults. The impact of hearing loss on quality of life is profound, with individuals experiencing communication difficulties, reduced job opportunities, and diminished overall well-being. The economic impact of hearing loss is also significant, with estimates suggesting that it costs billions of dollars annually in lost productivity and healthcare costs.

To address the issue of hearing loss, hearing aids are commonly used to assist individuals with hearing loss. However, hearing aids have limitations, particularly in noisy environments. Many individuals with hearing loss find it challenging to communicate in noisy situations, such as restaurants, public transport, or social gatherings. The inability to hear adequately in these situations can lead to frustration and further isolation. Therefore, improving the performance of hearing aids in noisy environments is crucial to enable individuals with hearing loss to participate fully in society and improve their quality of life.

Related Publications

IEEE Access

Speech Intelligibility Prediction Using Binaural Processing for Hearing Loss

Xiajie Zhou, Candy Olivia Mawalim, and Masashi Unoki.

IEEE Access, Jan 2025.

Abstract DOI PDF

As the global issue of hearing loss becomes increasingly severe, developing effective speech intelligibility prediction methods is crucial for improving the performance of hearing aids. However, current methods struggle in noisy environments and overlook individual differences in hearing loss between ears, which impacts prediction accuracy. Therefore, this study proposes a non-intrusive speech intelligibility prediction method that incorporates the binaural processing for hearing loss. The proposed method simulates the multi-stage binaural processing of the outer, middle, and inner ear and integrates binaural cues through an equalization-cancellation model to mitigate masking effects in noisy environments. Key features extracted from speech signals serve as inputs for a hybrid speech intelligibility model combining long short-term memory (LSTM) and light gradient boosting machine (LightGBM) models. The proposed method captures the critical features of speech signals, especially in challenging environments and for different types of hearing loss. Experimental results show that, compared to the baseline system of the second Clarity Prediction Challenge (CPC2) dataset, the proposed method achieves an 8.3% reduction in root mean squared error (RMSE). Notably, the proposed method reduces RMSE by 12.8% when predicting inconsistent hearing loss compared to listeners with consistent hearing levels, confirming the potential of combining hearing loss modeling with binaural processing.
Interspeech

Are Recent Deep Learning-Based Speech Enhancement Methods Ready to Confront Real-World Noisy Environments?

Candy Olivia Mawalim, Shogo Okada, and Masashi Unoki.

The 25th Interspeech Conference, Kos Island, Greece, 1-5 September 2024.

Abstract PDF DOI Code Demo

Recent advancements in speech enhancement techniques have ignited interest in improving speech quality and intelligibility. However, the effectiveness of recently proposed methods is unclear. In this paper, a comprehensive analysis of modern deep learning-based speech enhancement approaches is presented. Through evaluations using the Deep Suppression Noise and Clarity Enhancement Challenge datasets, we assess the performances of three methods: Denoiser, DeepFilterNet3, and FullSubNet+. Our findings reveal nuanced performance differences among these methods, with varying efficacy across datasets. While objective metrics offer valuable insights, they struggle to represent complex scenarios with multiple noise sources. Leveraging ASR-based methods for these scenarios shows promise but may induce critical hallucination effects. Our study emphasizes the need for ongoing research to refine techniques for diverse real-world environments.
APSIPA

Incorporating the Digit Triplet Test in A Lightweight Speech Intelligibility Prediction for Hearing Aids.

Xiajie Zhou, Candy Olivia Mawalim, and Masashi Unoki.

The 15th Asia-Pasific Signal and Information Processing Association (APSIPA ASC 2023), Taipei, Taiwan, 31 October - 3 November 2023.

Abstract DOI IEEE PDF

Recent studies in speech processing often utilize sophisticated methods for solving a task to obtain high-accuracy results. Although high performance could be achieved, the methods are too complex and require high-performance computational power that might not be available for a wide range of researchers. In this study, we propose a method to incorporate the low dimensional and the recent state-of-the-art acoustic features for speech processing to predict the speech intelligibility in noise for hearing aids. The proposed method was developed based on the stack regressor on various traditional machine learning regressors. Unlike other existing works, we utilized the results of the digit triplet test, which is usually used to measure the hearing ability in the existence of noise, to improve the prediction. The evaluation of our proposed method was carried out by using the first Clarity Prediction Challenge dataset. This dataset is utilized for speech intelligibility prediction that consists of speech signals output of hearing aids that were arranged in various simulated scenes with interferers. Our experimental results show that the proposed method could improve speech intelligibility prediction. The results also show that the digit triplet test results are beneficial for speech intelligibility prediction in noise.
APAC

Non-Intrusive Speech Intelligibility Prediction Using an Auditory Periphery Model with Hearing Loss.

Candy Olivia Mawalim, Benita Angela Titalim, Shogo Okada, and Masashi Unoki.

Applied Acoustics, 2023.

Abstract DOI PDF

Speech intelligibility prediction methods are necessary for hearing aid development. However, many such prediction methods are categorized as intrusive metrics because they require reference speech as input, which is often unavailable in real-world situations. Additionally, the processing techniques in hearing aids may cause temporal or frequency shifts, which degrade the accuracy of intrusive speech intelligibility metrics. This paper proposes a non-intrusive auditory model for predicting speech intelligibility under hearing loss conditions. The proposed method requires binaural signals from hearing aids and audiograms representing the hearing conditions of hearing-impaired listeners. It also includes additional acoustic features to improve the method’s robustness in noisy and reverberant environments. A two-dimensional convolutional neural network with neural decision forests is used to construct a speech intelligibility prediction model. An evaluation conducted with the first Clarity Prediction Challenge dataset shows that the proposed method performs better than the baseline system.
EUSIPCO

Auditory Model Optimization with Wavegram-CNN and Acoustic Parameter Models for Nonintrusive Speech Intelligibility Prediction in Hearing Aids.

Candy Olivia Mawalim, Benita Angela Titalim, Shogo Okada, and Masashi Unoki.

The 31st European Signal Processing Conference (EUSIPCO 2023), Helsinki, Finland.

Abstract DOI PDF

Nonintrusive speech intelligibility (SI) prediction is essential for evaluating many speech technology applications, including hearing aid development. In this study, several factors related to hearing perception are investigated to predict SI. In the proposed method, we integrated a physiological auditory model from two ears (binaural EarModel), wavegram-CNN model and acoustic parameter model. The refined EarModel does not require clean speech as input (blind method). In EarModel, the perception caused by hearing loss is simulated based on audiograms. Meanwhile, the wavegram-CNN and acoustic parameter models represent the factors related to the speech spectrum and acoustics, respectively. The proposed method is evaluated based on the scenario from the 1st Clarity Prediction Challenge (CPC1). The results show that the proposed method outperforms the intrusive baseline MBSTOI and HASPI methods in terms of the Pearson coefficient (ρ), RMSE, and R2 score in both closed-set and open-set tracks. Based on the results from listener-wise evaluation results, the average $\rho$ could be improved by more than 0.3 using the proposed method.
SST

OBISHI: Objective Binaural Intelligibility Score for the Hearing Impaired.

Candy Olivia Mawalim, Benita Angela Titalim, Masashi Unoki, and Shogo Okada.

SST2022, The 18th Australasian International Conference on Speech Science and Technology, Canberra, Australia, 13--16 December 2022.

Abstract PDF

Speech intelligibility prediction for both normal hearing and hearing impairment is very important for hearing aid development. The Clarity Prediction Challenge 2022 (CPC1) was initiated to evaluate the speech intelligibility of speech signals produced by hearing aid systems. Modified binaural short-time objective intelligibility (MBSTOI) and hearing aid speech prediction index (HASPI) were introduced in the CPC1 to understand the basis of speech intelligibility prediction. This paper proposes a method to predict speech intelligibility scores, namely OBISHI. OBISHI is an intrusive (non-blind) objective measurement that receives binaural speech input and considers the hearing-impaired characteristics. In addition, a pre-trained automatic speech recognition (ASR) system was also utilized to infer the difficulty of utterances regardless of the hearing loss condition. We also integrated the hearing loss model by the Cambridge auditory group and the Gammatone Filterbank-based prediction model. The total evaluation was conducted by comparing the predicted intelligibility score of the baseline MBSTOI and HASPI with the actual correctness of listening tests. In general, the results showed that the proposed method, OBISHI, outperformed the baseline MBSTOI and HASPI (improved approximately 10% classification accuracy in terms of F1 score).

APSIPA

Speech Intelligibility Prediction for Hearing Aids Using an Auditory Model and Acoustic Parameters.

Benita Angela Titalim*, Candy Olivia Mawalim*, Shogo Okada, and Masashi Unoki.

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Chiang Mai, Thailand, 7--10 November 2022.

Abstract PDF

Objective speech intelligibility (SI) metrics for hearing-impaired people play an important role in hearing aid development. The work on improving SI prediction also became the basis of the first Clarity Prediction Challenge (CPC1). This study investigates a physiological auditory model called EarModel and acoustic parameters for SI prediction. EarModel is utilized because it provides advantages in estimating human hearing, both normal and impaired. The hearing-impaired condition is simulated in EarModel based on audiograms; thus, the SI perceived by hearing-impaired people is more accurately predicted. Moreover, the extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS) and WavLM, as additional acoustic parameters for estimating the difficulty levels of given utterances, are included to achieve improved prediction accuracy. The proposed method is evaluated on the CPC1 database. The results show that the proposed method improves the SI prediction effects of the baseline and hearing aid speech prediction index (HASPI). Additionally, an ablation test shows that incorporating the eGeMAPS and WavLM can significantly contribute to the prediction model by increasing the Pearson correlation coefficient by more than 15% and decreasing the root-mean-square error (RMSE) by more than 10.00 in both closed-set and open-set tracks.

Speech Science and Technology in Hearing Aids

Related Publications

Speech Intelligibility Prediction Using Binaural Processing for Hearing Loss

Are Recent Deep Learning-Based Speech Enhancement Methods Ready to Confront Real-World Noisy Environments?

Incorporating the Digit Triplet Test in A Lightweight Speech Intelligibility Prediction for Hearing Aids.

Non-Intrusive Speech Intelligibility Prediction Using an Auditory Periphery Model with Hearing Loss.

Auditory Model Optimization with Wavegram-CNN and Acoustic Parameter Models for Nonintrusive Speech Intelligibility Prediction in Hearing Aids.

OBISHI: Objective Binaural Intelligibility Score for the Hearing Impaired.

Speech Intelligibility Prediction for Hearing Aids Using an Auditory Model and Acoustic Parameters.