Diagnosing Dysarthria With Long Short-Term Memory Networks: September 15-19, 2019, Graz, Austria

Uploaded by

ABDUL KHADER U A

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views5 pages

Diagnosing Dysarthria With Long Short-Term Memory Networks: September 15-19, 2019, Graz, Austria

Uploaded by

ABDUL KHADER U A

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

INTERSPEECH 2019

September 15–19, 2019, Graz, Austria

Diagnosing Dysarthria with Long Short-Term Memory Networks

Alex Mayle1 , Zhiwei Mou2 , Razvan Bunescu1 , Sadegh Mirshekarian1 , Li Xu3 , Chang Liu1
1
School of Electrical Engineering and Computer Science, Ohio University, Athens, OH, USA
2
Department of Rehabilitation, First Affiliated Hospital of Jinan University, Guangzhou, China
3
School of Rehabilitation and Communication Sciences, Ohio University, Athens, OH, USA
am218112@[Link], mouzhiwei@[Link], bunescu,sm774113,xul,liuc@[Link]

Abstract
This paper proposes the use of Recurrent Neural Networks
(RNNs) with Long Short-Term Memory (LSTM) units for de-
termining whether Mandarin-speaking individuals are afflicted
with a form of Dysarthria based on samples of syllable pro-
nunciations. Several LSTM network architectures are evalu-
ated on this binary classification task, using accuracy and Re-
ceiver Operating Characteristic (ROC) curves as metrics. The
LSTM models are shown to significantly improve upon a base-
line fully connected network, reaching over 90% area under the
ROC curve on the task of classifying new speakers, when a suf-
ficient number of cepstrum coefficients are used. The results
show that the LSTM’s ability to leverage temporal information
within its input makes for an effective step in the pursuit of ac-
Figure 1: From a waveform example X to its classification Y
cessible Dysarthria diagnoses.
in the proposed model architecture.
Index Terms: Dysarthria, RNN, LSTM, speech processing

1. Introduction increase in the number of patients with speech disorders, and

There are approximately 7 million individuals in China suf- the paucity of professional speech therapists, an objective and
fering from various speech disabilities. One such disor- accurate method for identifying individuals with dysarthria is
der, Dysarthria, results in an increased difficulty to articulate deemed timely and necessary. To this end, we present a Long
phonemes, due to neurological injuries that cause impaired or Short-Term Memory (LSTM) network architecture trained to
uncoordinated movement of the muscles, including the lips, identify those who suffer from Dysarthria, given Mandarin syl-
tongue, lower jaw, velum, vocal folds, and diaphragm during lable pronunciations as input. Most established medical prac-
speech production. The impact of Dysarthria is exacerbated in tices regarding the diagnosis of Dysarthria, such as the Frenchay
Mandarin-speaking individuals because Mandarin Chinese is a Dysarthria Assessment (FDA) [3], require the patient be physi-
tone language in which variations in tone at syllable level carry cally present and undergo a series of examinations. In contrast,
lexical meaning. the system presented here increases accessibility by merely re-
With the aging population increase, the number of people lying on speech as input. While it is doubtful that such a system
with Dysarthria will continue to grow. Given the challenges can completely replace diagnosis by a medical practitioner, it
it poses to effective communication, accessible means to diag- has the potential to provide a more accessible, less invasive, ini-
nosis is paramount. Currently, there are two main categories tial step in seeking care.
of Dysarthria assessment: subjective approaches and objective
approaches. The most common used assessments in recent re- 2. Model Architecture
habilitation practice and speech rehabilitation institutions are
still those based on subjective auditory perception and/or sub- Given an audio clip X, containing the pronunciation of a Man-
jective scales, with poor objectivity and stability. Objective as- darin syllable, the model is to produce a label Y , which is posi-
sessment methods include oro-pharyngeal physical examination tive if and only if the speaker suffers from Dysarthria. Figure 1
and electroglottography examination. These and other types of illustrates a single training example’s path through the proposed
examinations however have unsatisfactory compliance of pa- processing architecture.
tients. Patients with Dysarthria may also turn to neurology de- The raw waveform X is first pre-processed into a mel-
partments and speech rehabilitation institutions, however, the frequency cepstral coefficients (MFCC) feature vector X 0 =
lack of interdisciplinary coordination results in incomplete and {x1 , ..., xt , ..., xT }, where the number of MFCC frames T can
subjective examinations, causing low consistency among hospi- be different from one raw input X to another. The MFCC vec-
tals and institutions. In China, this problem is compounded by tors xt were created using a sliding window of 25 milliseconds
the insufficient number of professional speech therapists. The with a 10 millisecond stride. Each MFCC vector consists of
current estimated number of speech therapists in China is less N coefficients xt = {θ1 , ..., θn , ..., θN }, where N = 13 unless
than 10,000 [1] whereas the demand for such professionals in a explicitly stated. These were collected and normalized such that
country of a population of 1.38 billion is 368,350, according to each coefficient θn had zero mean and unit variance across all
Enderby and Davies [2]. To release the pressure caused by the training examples. An LSTM is run over the MFCC sequence

Table 1: The distribution of positive (with Dysarthria) and neg- Table 2: The syllable-level performance of baseline and LSTM
ative (no Dysarthria) individuals and syllables (shown within models in experiment I (known speakers).
brackets) in the dataset.
Accuracy Precision Recall F-measure
Ratio Female Male Baseline 79.0 79.5 82.9 81.2
Positive 46.6% 12 [1001] 19 [1792] LSTM-1 88.7 88.5 81.2 84.7
Negative 53.4% 19 [1600] 19 [1605] LSTM-2 88.7 88.0 81.7 84.7
Total 100% 31 [2601] 38 [3397] Bi-LSTM-1 87.8 86.6 81.8 84.1

X 0 and the last output hT is provided to the logistic regression models, as follows:
layer. 1. Baseline: A fully connected feedforward neural network
We experimented with several variants of the LSTM model, with one hidden layer.
including adding layers and using a bidirectional LSTMs. For
the models with one layer, L2 regularization was used. The two- 2. LSTM-1: Single layer, unidirectional LSTM.
layer model employed dropout [4, 5] between the LSTM layers, 3. LSTM-2: Double layer, unidirectional LSTM.
as well as between the last LSTM layer and logistic regression.
4. BiLSTM-1: Single layer, bidirectional LSTM.
Bidirectional LSTM networks perform two concurrent passes
on the data, left to right and right to left. The output vectors All models use a hidden layer size of 200. They are
produced by the two passes are concatenated and fed to the lo- trained using Adam [9] for 40 epochs on mini-batches of size
gistic regression layer. 64. The training objective is formulated as the syllable-level
cross-entropy loss between the predictions and the ground truth
3. Evaluation Methodology provided by the medical practitioners who collected the data.
While only the individuals were manually labeled as having
The evaluation data consists of samples of syllables recorded Dysarthria (positive) or not (negative), the label for each indi-
from 69 Mandarin speaking adults, 38 male and 31 female. vidual was also assigned to all the syllables coming from that
The number of individuals in each class is presented in Table individual, and the cross-entropy loss was formulated using syl-
1, together with the corresponding number of recorded sylla- lables as examples.
bles. The participants were from Jinan University School of To prevent overfitting, a form of early stopping was em-
Medicine, who included 31 native Mandarin-speaking patients ployed, where training is stopped when the ratio of the current
(19 males and 12 females) with post-stroke Dysarthria. The age validation error curr to the lowest error seen thus far min ex-
of the dysarthric speakers ranged from 25 to 83 years old [mean ceeds a threshold 1 + α, i.e. curr /min > 1 + α, where
± SD: 56.74 ± 16.40 years]. All participants went through α = 0.075. A grace period is used, such that training is only
physical examination, Frenchay Dysarthria Assessment, and stopped if the threshold is met for 5 epochs in a row.
other auxiliary examinations (such as brain CT, MRI). Before
the stroke occurred, all patients had no speech-related impair-
ments and were able to communicate fluently in Mandarin.
4. Experiment I: Known Speakers
They had no alexia, visual, or severe auditory comprehension We first compared the LSTM models against the baseline fully
impairments, and had pure-tone thresholds at 500, 1000, and connected network on the relatively easier task of non-novel
2000 Hz of ≤ 25 dB HL in at least one ear. The control group speakers. Here, the entire set of syllables from the dataset is
included 38 healthy adults (HA) (19 males and 19 females) in shuffled and partitioned into the training, testing, and valida-
a similar age range (21 to 76 years old; mean ± SD: 45.89 ± tion sets using a number of syllables ratio of [Link], respectively.
13.02 years). Some of the family members of the Dysarthria Since the dataset is partitioned at syllable-level, it is possible
groups were recruited into the HA group. They all had pure- for a patient to have their syllables partitioned among the train-
tone thresholds at 500, 1000, and 2000 Hz of ≤ 25 dB HL in at ing, validation, and test sets. Thus, syllables that appear at test
least one ear with no reported hearing or speech disorders. More time may come from patients that have been observed at train-
details about the demographical information of the participants ing time.
and the acoustic properties of the speech samples can be found We employed the syllable-level accuracy, precision, and re-
in [6]. Informed consent was obtained from all participants. All call as metrics to judge the performance of each model [10].
research was performed in accordance with relevant guidelines Accuracy is the percentage of correct syllable labels predicted
and regulations. by the system. Precision and recall were also considered due to
Note that although the positive or negative labels are orig- the medical nature of our experiments. That is, most people do
inally assigned only to speakers, the models are trained and not suffer from Dysarthria, but it is the instances in which one
tested using syllables as input. To assign labels to syllables dur- does that are important to classify correctly. Precision is the per-
ing training, we propagate the speaker label to all the syllables centage of correct positive predictions (true positives) out of all
recorded from that speaker. This is bound to introduce label the positive predictions (true positives + false positives). Recall
noise in the positive labeled syllables, as not all syllables from is the percentage of correct positive predictions out of all the
a speaker afflicted with Dysarthria exhibit abnormal speech pro- positive examples (true positives + false negatives). Because
duction. If a speaker is seen to correspond to a bag of syllables, individuals who receive a negative prediction (i.e., who do not
the problem corresponds to a multiple instance learning (MIL) suffer from Dysarthria) are less likely to seek a second opinion,
setting [7] [8]. In this paper, we use the simple MIL approach of we are especially interested in a higher recall.
projecting bag labels to all syllable instances in the bag, leaving The baseline fully connected model obtains 79.0% accu-
the use of more sophisticated MIL methods for future work. racy, which is a significant improvement over the 53.4% of
The dataset was used for the training and evaluation of four the majority classifier. Table 2 also shows the results for each

4515
Figure 2: Receiver-operating characteristic (ROC) and Figure 3: Receiver-operating characteristic (ROC) and
precision-recall (PR) curves for syllable-level classification. precision-recall (PR) curves for speaker-level evaluation.

LSTM model. LSTM-1 and LSTM-2 achieve similar perfor- logistic regression output for the k-th syllable of that speaker.
mance, outperforming the baseline and making a marginal im- Then the two aggregation methods compute the speaker-level
provement upon the Bi-LSTM-1 model. probability as follows:
m
5. Experiment II: Novel Speakers 1 X
soft-majority = σk (1)
m
k=1
While LSTM models were shown to outperform the baseline on
m
the task of classifying syllables from known speakers, we are 1 X
also interested in their performace on novel speakers. In the ex- noisy-OR = 1 − log (1 − σk ) (2)
m
k=1
periment from Section 4, the training set and the test set may
contain syllables from the same speaker. To more accurately Because the traditional noisy-OR calculation would be affected
match the application to novel speakers, in the experiment from by the number of syllables each speaker has, we computed it in
this section the training and test sets were created by partition- log-space and normalized the probability of the negative class
ing the set of speakers. As such, an individual’s syllables appear by the number of that speaker’s syllables. This allowed us to
either in the test or in the training set, but not in both. directly compare the noisy-OR scores between speakers with a
Because the number of speakers in the dataset is relatively varying number of examples.
small, we opted to evaluate the models using 10-fold cross val- Figure 3 presents the receiver-operating characteristic
idation. We randomly sampled 9 of the 69 speakers to use as (ROC) and precision-recall (PR) curves for the soft-majority
a validation set. The remaining 60 were partitioned into 10 method of inference. The noisy-OR method produced identi-
groups of 6. In each of the 10 evaluation rounds, the models cal results, therefore its curves are not shown. The LSTM-1
were trained on 54 speakers and tested on a different group of model again obtains the best results. The first line in Table 3
6 novel speakers. The trained models are then evaluated in two shows the performance of the three LSTM models in terms of
scenarios: syllable-level and speaker-level classification. the area under the ROC curve (AUC). The speaker-level per-
formance is higher than at syllable level, likely because not all
5.1. Syllable-Level Evaluation syllables from speakers afflicted with Dysarthria exhibit abnor-
In the syllable-level classification, the trained models are eval- mal speech production. Because an accurate diagnosis cannot
uated by how good they are at classifying syllables from the be expected to result from a single syllable, the speaker-level
speakers in the test set. This is similar to the experiment from method is therefore more appropriate for practical purposes.
Section 4, except that now the test syllables are now coming
from novel speakers. Figure 2 shows the receiver-operating Table 3: AUC scores: speaker-level vs. syllable-level.
characteristic (ROC) and precision-recall (PR) curves. To gain
perspective on the ROC behavior, we consider a model which LSTM-1 LSTM-2 Bi-LSTM-1
produces a positive classification with probability p. The major- Speaker-level 85.4 78.5 84.7
ity classifier can be seen as the extreme case, where p = 1. As p Syllable-level 75.4 65.9 71.4
increases from 0 to 1, the red ROC line in Figure 2 is produced,
with an area under the curve (AUC) of 0.5. The three LSTM
models clearly improve upon this baseline for all three methods 5.3. Effects of Syllables Types on Classification Accuracy
of inference, with LSTM-1 edging out the other two models. In
terms of AUC, the 3 systems obtain the following scores: 75.4 There are three types of syllables in the dataset: (1) syllables
for LSTM-1, 69.5 for LSTM-2, and 71.4 for Bi-LSTM-1. with monophthongs, (2) syllables with compound vowels, and
(3) syllables with consonant-/a/. In order to evaluate the classifi-
cation accuracy based on various types of syllables, we created
5.2. Speaker-Level Evaluation
three combinations of the syllable dataset. The first combina-
In the speaker-level classification, we take the logistic regres- tion (no prefix) consists of all three types of syllables. The sec-
sion outputs for all syllables belonging to a test speaker and ond combination (prefixed with ’cv-’) does not contain syllables
aggregate them into a single probability score that can be used with compound vowels. The third combination (prefixed with
to classify the speaker. To achieve this, we investigated two ag- ’c/a/-’) does not contain syllables with consonant-/a/. In this ex-
gregation methods: soft-majority and a normalized version of periment, we test how well the models perform when they are
noisy-OR. Given a speaker with m syllables, let σk be the the trained on the three different combinations of syllable types.

4516
havior in the sense that their maximum performance is achieved
for 19 coefficients. When all 25 coefficients are used, their per-
formance decreases, which could be due to a lack of capacity.

6. Related Work
Carmichael et al. [11] employed multilayer perceptrons and
decision trees to classify the different forms of Dysarthria, us-
ing as input a computerised Frenchay Dysarthria Assessement
(CFDA) profile, essentially a vector of articulatory dysfunction
Figure 4: Receiver-operating characteristic (ROC) and values measured using acoustic signal processing techniques.
precision-recall (PR) curves using all syllables vs. without syl- Unlike our work, however, the system is trained and tested on a
lables with compound vowels (prefixed with ’cv-’) vs. without distribution of English-speaking people already known to have
syllables with consonant-/a/ (prefixed with ’c/a/-’). some form of Dysarthria. Prior to this, an effort was made to
classify speakers into one of the categories of Dysarthria using
a manual Frenchay Dysarthria Assessement of each patient as
input [3][12]. The more advanced topic of recognizing speech
Figure 4 shows the receiver-operating characteristic (ROC) and
produced from someone with Dysarthria using RNN networks
precision-recall (PR) curves for the three LSTM models and the
has also been investigated recently for English speaking indi-
three different combinations of syllable types. The correspond-
viduals, using Elman recurrent neural networks in [13] and a
ing AUC scores are shown in Table 4.
hybrid deep neural network – hidden Markov model (DNN-
HMM) architecture in [14]. Wu et al. [15] presented a personal-
Table 4: Speaker-level AUC scores over all syllables (All) vs.
ized model adaptation for automatic speech recognition (ASR)
without syllables with compound vowels (No ’cv’) vs. without
targeted at Mandarin-speaking individuals afflicted with articu-
syllables with consonant-/a/ (No ’c/a/’).
lation disorders due to mild-to-moderate hearing impairment.
All No ’cv’ No ’c/a/’
LSTM-1 85.4 92.3 62.1 7. Conclusion and Future Work
LSTM-2 78.5 90.2 84.5
Bi-LSTM-1 84.7 92.0 85.9 This paper investigated the effectiveness of three LSTM net-
works, two uni-directional and one bi-directional, for the task
of Dysarthria diagnosis based on recordings of syllables from
The models performed significantly better according to both afflicted and healthy Mandarin speakers. In the first exper-
their AUC scores when syllables with compound vowels were iment, all LSTM architectures outperformed a fully connected
removed. All three models scored above 90% when trained baseline when evaluated using syllable-level accuracy, with the
without this type of syllables. Therefore, it may be a reasonable bi-directional variant slightly trailing the uni-directional vari-
heuristic to not include syllables with compound vowels when ants. The second experiment assumes the test syllables come
diagnosing a Dysarthria patient. This intuitively follows from from novel speakers, and evaluates the three LSTM models at
the observation that, even for healthy speakers, these syllables both syllable-level and speaker-level. When the syllables with
are more difficult to produce and variability of their acoustic compound vowels are removed from the dataset, all models ob-
properties is greater than syllables with monophthongs. tain over 90% AUC. Furthermore, we found that the LSTM
models’ performance could be improved by increasing the num-
5.4. Varying the Number of Cepstrum Coefficients ber of cepstrum coefficients. While these methods may not be
While including N = 13 cepstrum coefficients in each feature yet practical as a stand-alone medical test, they do suggest that
has produced promising results, there may still be room for im- LSTM networks may provide a fruitful avenue for the realiza-
provement by adding more coefficients. To this end, three 10- tion of autonomous Dysarthria diagnosis.
fold cross-validation evaluations were conducted in the same ZCA whitening is employed as a pre-processing step in
manner as before, with N = 13, 19, and 25 coefficients used as many audio classification tasks [16][17][18], as such it is a com-
input, respectively. When less than 25 cepstrum coefficients are pelling next step in an effort to improve performance. CNNs can
used, they are taken starting from the cepstrum with the lowest often performed competitively on sequence processing tasks
quefrency. Table 5 shows the speaker-level AUC scores for the [19], therefore we plan to comparatively evaluate CNNs and
three LSTM models using the soft-majority inference method. long-term recurrent CNNs [20], as well as dilated RNNs [21].
The model presented in [22] takes features much closer to the
Table 5: Speaker-level AUC scores for different numbers of cep- raw waveform when compared to MFCCs. Applying this ap-
strum coefficients. proach to Dysarthria classification may also prove to be effec-
tive. The type of training data available for speaker classifi-
N = 13 N = 19 N = 25 cation falls under the multiple instance learning (MIL) setting
LSTM-1 85.4 90.1 81.7 [7, 23]. Correspondingly, we plan to use LSTMs with models
LSTM-2 78.5 88.2 87.1 that are specifically designed for the MIL setting.
Bi-LSTM-1 84.7 88.4 90.4
8. Acknowledgements
Adding more cepstrum coefficients leads to substantial im-
provements in the performance of BiLSTM-1, matching LSTM- This study was supported in part by the NIH NIDCD Grant No.
1’s best performance. LSTM-1 and LSTM-2 have a similar be- R15-DC014587.

4517
9. References [20] J. Donahue, L. A. Hendricks, M. Rohrbach, S. Venugopalan,
S. Guadarrama, K. Saenko, and T. Darrell, “Long-term recurrent
[1] S. Li, Speech Therapy. People’s Health Press, 2013. convolutional networks for visual recognition and description,”
[2] P. Enderby and P. Davies, “Communication disorders: Planning IEEE Transactions on Pattern Analysis and Machine Intelligence,
a service to meet the needs,” International Journal of Language vol. 39, no. 4, pp. 677–691, Apr. 2017. [Online]. Available:
and Communication Disorders, vol. 4, no. 3, pp. 301–331, 1989. [Link]
[3] P. Enderby, “Frenchay Dysarthria assessment,” British Journal of [21] S. Chang, Y. Zhang, W. Han, M. Yu, X. Guo, W. Tan,
Disorders of Communication, vol. 15, no. 3, pp. 165–173, 1980. X. Cui, M. Witbrock, M. A. Hasegawa-Johnson, and T. S.
Huang, “Dilated recurrent neural networks,” in Advances in
[4] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and Neural Information Processing Systems 30, I. Guyon, U. V.
R. Salakhutdinov, “Dropout: A simple way to prevent neural Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan,
networks from overfitting,” Journal of Machine Learning and R. Garnett, Eds. Curran Associates, Inc., 2017, pp.
Research, vol. 15, no. 1, pp. 1929–1958, Jan. 2014. [Online]. 77–87. [Online]. Available: [Link]
Available: [Link] [Link]
[5] Y. Gal and Z. Ghahramani, “A theoretically grounded application [22] J. Lee, J. Park, K. L. Kim, and J. Nam, “Sample-level deep convo-
of dropout in recurrent neural networks,” in Proceedings of the lutional neural networks for music auto-tagging using raw wave-
30th International Conference on Neural Information Processing forms,” in Proceedings of the 14th Sound and Music Computing
Systems, ser. NIPS’16, 2016, pp. 1027–1035. [Online]. Available: Conference (SMC), Espoo, Finland, 2017. [Online]. Available:
[Link] [Link]/media/materials/proceedings/SMC17 [Link]
[6] Z. Mou, Z. Chen, J. Yang, and L. Xu, “Acoustic properties of [23] S. Ray, S. Scott, and H. Blockeel, “Multiple-instance learning,”
vowel production in Mandarin-speaking patients with post-stroke Encyclopedia of Machine Learning and Data Mining, pp. 1–13,
Dysarthria,” Scientific Reports, vol. 8, no. 14188, 2018. 2014.
[7] T. G. Dietterich, R. H. Lathrop, and T. Lozano-Pérez, “Solving the
multiple instance problem with axis-parallel rectangles,” Artificial
Intelligence, vol. 89, no. 1-2, pp. 31–71, Jan. 1997.
[8] S. Ray, S. Scott, and H. Blockeel, “Multiple-instance learning,” in
Encyclopedia of Machine Learning and Data Mining, 2017, pp.
882–892.
[9] D. P. Kingma and J. Ba, “Adam: A method for stochastic opti-
mization,” in Proceedings of the 3rd International Conference on
Learning Representations (ICLR).
[10] L. Torgo and R. Ribeiro, “Precision and recall for regression,” in
International Conference on Discovery Science. Springer, 2009,
pp. 332–346.
[11] J. Carmichael, V. Wan, and P. D. Green, “Combining neural net-
work and rule-based systems for Dysarthria diagnosis,” in Inter-
speech, 2008, pp. 2226–2229.
[12] J. N. Carmichael, Introducing objective acoustic metrics for the
Frenchay Dysarthria Assessment procedure. University of
Sheffield, 2007.
[13] S. S. Nidhyananthan, R. S. S. Kumari, and V. Shenbagalakshmi,
“Assessment of Dysarthric speech using Elman back propaga-
tion network (recurrent network) for speech recognition,” Interna-
tional Journal of Speech Technology, vol. 19, no. 3, pp. 577–583,
2016.
[14] C. España-Bonet and J. A. Fonollosa, “Automatic speech recogni-
tion with deep neural networks for impaired speech,” in Advances
in Speech and Language Technologies for Iberian Languages:
Third International Conference, IberSPEECH 2016, Lisbon, Por-
tugal, November 23-25, 2016, Proceedings 3. Springer, 2016,
pp. 97–107.
[15] C.-H. Wu, H.-Y. Su, and H.-P. Shen, “Articulation-disordered
speech recognition using speaker-adaptive acoustic models and
personalized articulation patterns,” ACM Transactions on Asian
Language Information Processing, vol. 10, no. 2, pp. 7:1–7:19,
Jun. 2011.
[16] Y. L. Gwon, W. M. Campbell, D. Sturim, and H. Kung, “Language
recognition via sparse coding,” in Interspeech, 2016.
[17] C. Chen, R. Bunescu, L. Xu, and C. Liu, “Tone classification in
Mandarin Chinese using convolutional neural networks,” Inter-
speech 2016, pp. 2150–2154, 2016.
[18] O. Vinyals and L. Deng, “Are sparse representations rich enough
for acoustic modeling?” in Interspeech, 2012, pp. 2570–2573.
[19] W. Yin, K. Kann, M. Yu, and H. Schütze, “Comparative study
of cnn and rnn for natural language processing,” CoRR, vol.
abs/1702.01923, 2017.

4518

Deep Learning for Dysarthria Severity Classification
No ratings yet
Deep Learning for Dysarthria Severity Classification
5 pages
Upcon 2025 437
No ratings yet
Upcon 2025 437
6 pages
Dysarthria Detection via TORGO & UA-Speech
No ratings yet
Dysarthria Detection via TORGO & UA-Speech
21 pages
Dysarthria Severity Classification with AI
No ratings yet
Dysarthria Severity Classification with AI
17 pages
Dysarthria Detection Using Wav2Vec Model
No ratings yet
Dysarthria Detection Using Wav2Vec Model
5 pages
Dysarthria Detection Using TORGO Dataset
No ratings yet
Dysarthria Detection Using TORGO Dataset
21 pages
Enhancing Dysarthric Speech Assessment
No ratings yet
Enhancing Dysarthric Speech Assessment
5 pages
Enhanced Dysarthria Detection Using WaveNet
No ratings yet
Enhanced Dysarthria Detection Using WaveNet
23 pages
Example Research Proposal2
No ratings yet
Example Research Proposal2
7 pages
Chinese Dysarthria Speech Database
No ratings yet
Chinese Dysarthria Speech Database
5 pages
Dysarthric Speech Recognition in Mandarin
No ratings yet
Dysarthric Speech Recognition in Mandarin
15 pages
Simulating Dysarthric Speech for Data Augmentation
No ratings yet
Simulating Dysarthric Speech for Data Augmentation
5 pages
Speaker Adaptation Using Spectro-Temporal Deep Features For Dysarthric and Elderly Speech Recognition
No ratings yet
Speaker Adaptation Using Spectro-Temporal Deep Features For Dysarthric and Elderly Speech Recognition
15 pages
Detection of Pathological Voice Using Cepstrum Vectors: A Deep Learning Approach
No ratings yet
Detection of Pathological Voice Using Cepstrum Vectors: A Deep Learning Approach
8 pages
LSTM Model for Voice Disorder Detection
No ratings yet
LSTM Model for Voice Disorder Detection
4 pages
AI System for Dysarthric Speech Recognition
No ratings yet
AI System for Dysarthric Speech Recognition
15 pages
AI Models for Dysarthria Severity Detection
No ratings yet
AI Models for Dysarthria Severity Detection
16 pages
Dysarthria Speech Classification with CRNN
No ratings yet
Dysarthria Speech Classification with CRNN
7 pages
Detecting Dysarthria from Raw Speech
No ratings yet
Detecting Dysarthria from Raw Speech
5 pages
Early Dysarthria Detection with DNN
No ratings yet
Early Dysarthria Detection with DNN
4 pages
Dysarthria Speech Classification Models Analysis
No ratings yet
Dysarthria Speech Classification Models Analysis
13 pages
Mandarin Mispronunciation Detection RNN-T
No ratings yet
Mandarin Mispronunciation Detection RNN-T
5 pages
Machine Learning for Early Parkinson's Detection
No ratings yet
Machine Learning for Early Parkinson's Detection
14 pages
Graph Neural Networks for PD Detection
No ratings yet
Graph Neural Networks for PD Detection
5 pages
Voice Disorder Classification with Deep Learning
No ratings yet
Voice Disorder Classification with Deep Learning
18 pages
Voice Disorder Classification with CNN SVM
No ratings yet
Voice Disorder Classification with CNN SVM
15 pages
Hybrid AI Approach for Voice Disorder Classification
No ratings yet
Hybrid AI Approach for Voice Disorder Classification
15 pages
Dysarthria Voice Assessment Study
No ratings yet
Dysarthria Voice Assessment Study
11 pages
Multitask Learning for Dysphonia Analysis
No ratings yet
Multitask Learning for Dysphonia Analysis
12 pages
Parkinsonian Chinese Speech Analysis towards Automatic
No ratings yet
Parkinsonian Chinese Speech Analysis towards Automatic
12 pages
Neurological Voice Disorder Classification
No ratings yet
Neurological Voice Disorder Classification
10 pages
Speech Pathology Detection Using DNN
No ratings yet
Speech Pathology Detection Using DNN
15 pages
Voice Disorder Detection Algorithms Overview
No ratings yet
Voice Disorder Detection Algorithms Overview
2 pages
Voice Disorder Detection via ML Algorithms
No ratings yet
Voice Disorder Detection via ML Algorithms
18 pages
07 - Voice Disorder Classification Using Empirical Mode Decomposition and Deep Learning Techniques
No ratings yet
07 - Voice Disorder Classification Using Empirical Mode Decomposition and Deep Learning Techniques
5 pages
Tanuka is 2021
No ratings yet
Tanuka is 2021
5 pages
Dysarthric Speech Recognition Evaluation
No ratings yet
Dysarthric Speech Recognition Evaluation
21 pages
Stuttering Detection with Deep Learning Techniques
No ratings yet
Stuttering Detection with Deep Learning Techniques
12 pages
EEG Deep Learning for Voice Pathology
No ratings yet
EEG Deep Learning for Voice Pathology
14 pages
Jennifer - Springer - 2020 Published PDF
No ratings yet
Jennifer - Springer - 2020 Published PDF
14 pages
Dysarthria Speech Intelligibility Assessment
No ratings yet
Dysarthria Speech Intelligibility Assessment
12 pages
Machine Learning for Stuttering Detection
No ratings yet
Machine Learning for Stuttering Detection
36 pages
Voice Pathology Detection via EGG Signals
No ratings yet
Voice Pathology Detection via EGG Signals
4 pages
Perceptual Classification of MSDs
No ratings yet
Perceptual Classification of MSDs
21 pages
Dysarthric Speech Recognition Techniques
No ratings yet
Dysarthric Speech Recognition Techniques
6 pages
Early detection of Parkinson’s disease from multiple signal speech Based on Mandarin language dataset
No ratings yet
Early detection of Parkinson’s disease from multiple signal speech Based on Mandarin language dataset
13 pages
Early Parkinson's Detection via Deep Learning
No ratings yet
Early Parkinson's Detection via Deep Learning
12 pages
Deep Learning for Parkinson's Voice Analysis
No ratings yet
Deep Learning for Parkinson's Voice Analysis
23 pages
Deep Learning for Parkinson's Detection
No ratings yet
Deep Learning for Parkinson's Detection
12 pages
Enhancing Dysarthric Speech Recognition
No ratings yet
Enhancing Dysarthric Speech Recognition
23 pages
Speech Processing for Voice Pathology Detection
No ratings yet
Speech Processing for Voice Pathology Detection
9 pages
AI-Driven Throat Device for Dysarthria
No ratings yet
AI-Driven Throat Device for Dysarthria
16 pages
ASR Performance in Parkinson's Patients
No ratings yet
ASR Performance in Parkinson's Patients
6 pages
Validating BoDyS for Chilean Spanish Dysarthria
No ratings yet
Validating BoDyS for Chilean Spanish Dysarthria
15 pages
Advances in Dysarthric Speech Recognition
No ratings yet
Advances in Dysarthric Speech Recognition
9 pages
Acoustic-Based Assistive Technology Tools For Dysarthria Management
No ratings yet
Acoustic-Based Assistive Technology Tools For Dysarthria Management
209 pages
Acoustic Typology of Speech Disorders
No ratings yet
Acoustic Typology of Speech Disorders
19 pages
Parkinson's Disease Detection via Speech
No ratings yet
Parkinson's Disease Detection via Speech
12 pages
Stuttering Speech Recognition with LSTM
No ratings yet
Stuttering Speech Recognition with LSTM
5 pages
Statistics MCQs for Competitive Exams
100% (5)
Statistics MCQs for Competitive Exams
8 pages
ARIMA and GARCH Model Analysis
No ratings yet
ARIMA and GARCH Model Analysis
41 pages
Grade 11 Statistics Lesson Plan
No ratings yet
Grade 11 Statistics Lesson Plan
8 pages
CSBS Exam Registration Preview 2024
No ratings yet
CSBS Exam Registration Preview 2024
26 pages
Biostatistics Tutorial Sheet 3 Solutions
No ratings yet
Biostatistics Tutorial Sheet 3 Solutions
2 pages
Understanding Data Analysis Techniques
No ratings yet
Understanding Data Analysis Techniques
16 pages
Binomial Distribution Problem Solutions
No ratings yet
Binomial Distribution Problem Solutions
3 pages
Procurement Practices in Kenyan Counties
100% (1)
Procurement Practices in Kenyan Counties
14 pages
The Architecture of Frank Lloyd Wright: A Complete Catalog Online Version
No ratings yet
The Architecture of Frank Lloyd Wright: A Complete Catalog Online Version
85 pages
Surrogate Information and Sampling Errors
No ratings yet
Surrogate Information and Sampling Errors
14 pages
Financial Modelling Cheat Sheet
No ratings yet
Financial Modelling Cheat Sheet
7 pages
Biological Systems Regression Analysis
No ratings yet
Biological Systems Regression Analysis
14 pages
One-Way Analysis of Variance For Independent or Correlated Samples
No ratings yet
One-Way Analysis of Variance For Independent or Correlated Samples
4 pages
Quiz 9 and Solutions
No ratings yet
Quiz 9 and Solutions
3 pages
Introduction to Inferential Statistics
No ratings yet
Introduction to Inferential Statistics
21 pages
Student Performance Prediction Project
No ratings yet
Student Performance Prediction Project
7 pages
Besrapub 56
No ratings yet
Besrapub 56
15 pages
Econometric Modeler Results Summary
No ratings yet
Econometric Modeler Results Summary
8 pages
Two-Way ANOVA: Overview and Examples
No ratings yet
Two-Way ANOVA: Overview and Examples
22 pages
Validating Cointegration in ECM
No ratings yet
Validating Cointegration in ECM
31 pages
B.Com Sem 6 Project Report Guidelines
No ratings yet
B.Com Sem 6 Project Report Guidelines
4 pages
1526-Article Text-3555-1-10-20221129
No ratings yet
1526-Article Text-3555-1-10-20221129
9 pages
Statistical Inference for Two Samples
No ratings yet
Statistical Inference for Two Samples
39 pages
CBSE UGC NET JAN 2017 Tips Tricks and Guides
100% (1)
CBSE UGC NET JAN 2017 Tips Tricks and Guides
23 pages
Brand Loyalty Influence Factors
No ratings yet
Brand Loyalty Influence Factors
13 pages
STA1501 Assignment 1 Overview 2024
No ratings yet
STA1501 Assignment 1 Overview 2024
6 pages
Bayesian Learning for Robotic Clothing Tasks
No ratings yet
Bayesian Learning for Robotic Clothing Tasks
5 pages
Statistical Analysis Report Template
No ratings yet
Statistical Analysis Report Template
2 pages
Descriptive Statistics Worksheet Answers
No ratings yet
Descriptive Statistics Worksheet Answers
6 pages
Cox Regression Overview by Kristin Sainani
No ratings yet
Cox Regression Overview by Kristin Sainani
62 pages

Diagnosing Dysarthria With Long Short-Term Memory Networks: September 15-19, 2019, Graz, Austria

Uploaded by

Diagnosing Dysarthria With Long Short-Term Memory Networks: September 15-19, 2019, Graz, Austria

Uploaded by

INTERSPEECH 2019

September 15–19, 2019, Graz, Austria

Diagnosing Dysarthria with Long Short-Term Memory Networks

1. Introduction increase in the number of patients with speech disorders, and

Copyright © 2019 ISCA 4514 [Link]

You might also like