0% found this document useful (0 votes)
10 views108 pages

Short-Time Processing of Speech Signals

The document discusses short-time processing of speech signals, emphasizing the importance of windowing and time-frequency transforms like DFT and DCT. It outlines the step-by-step process for applying these transforms, their applications in speech processing, and compares DFT with DCT and MFCC. Additionally, it highlights the advantages and limitations of various cepstral coefficients used in speech analysis.

Uploaded by

itsragno
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views108 pages

Short-Time Processing of Speech Signals

The document discusses short-time processing of speech signals, emphasizing the importance of windowing and time-frequency transforms like DFT and DCT. It outlines the step-by-step process for applying these transforms, their applications in speech processing, and compares DFT with DCT and MFCC. Additionally, it highlights the advantages and limitations of various cepstral coefficients used in speech analysis.

Uploaded by

itsragno
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Short-time processing

of speech signals
Introduction
• We need to split the signal into shorter segments and
apply windowing functions
• extra steps to be followed when we want to modify the
signal.
• windowing
• time-frequency transform such as the DFT (optional)
• apply the desired processing
• inverse time-frequency transform (when DFT was
Introduction
• Time-frequency transforms such as the discrete Fourier
transform (DFT) and the discrete cosine transform
(DCT) are orthonormal and have well-known fast
algorithms for their inverses.
• The main challenge is thus the “reverse of windowing”,
whatever that might be.
• The direct approach of just multiplying with the inverse
of the windowing function has a problem.
Discrete Fourier transform
Discrete Fourier transform
Discrete Fourier transform
Discrete Fourier transform
DFT including sampling interval
DFT including sampling interval
DFT including sampling interval
Applying DFT for speech signals
Applying DFT for speech signals
Applying DFT for speech signals
Challenges in speech signals
Challenges in speech signals
Challenges in speech signals
Challenges in speech signals
The Step-by-Step Process in
Detail
The Step-by-Step Process in
Detail
The Step-by-Step Process in
Detail
The Step-by-Step Process in
Detail
The Step-by-Step Process in
Detail
The Step-by-Step Process in
Detail
Key Applications of DFT/STFT in
Speech Processing
Key Applications of DFT/STFT in
Speech Processing
Key Applications of DFT/STFT in
Speech Processing
Key Applications of DFT/STFT in
Speech Processing
Key Applications of DFT/STFT in
Speech Processing
Key Applications of DFT/STFT in
Speech Processing
Key Applications of DFT/STFT in
Speech Processing
Limitations and Considerations
Limitations and Considerations
Limitations and Considerations
Discrete Cosine Transform (DCT)
Mel-Frequency Cepstral Coefficients
(MFCC)
Key Application: Mel-Frequency
Cepstral Coefficients (MFCCs)
Key Application: Mel-Frequency
Cepstral Coefficients (MFCCs)
Key Application: Mel-Frequency
Cepstral Coefficients (MFCCs)
Mathematical Formulation
Differences Between DFT and
Aspect Discrete FourierDCT
Transform Discrete Cosine Transform
(DFT) (DCT)

Output Type Complex-valued coefficients Real-valued coefficients

Basis Functions Complex exponentials: e^(- Real cosines: cos(πk(2n+1)/(2N))


j2πkn/N)
Phase Information Preserves both magnitude Discards phase information
and phase
Energy Compaction Good for periodic signals Excellent for correlated signals

Boundary Implicitly periodic extension Implicitly even-symmetric


Conditions extension
Computational Higher (complex arithmetic) Lower (real arithmetic only)
Load
Primary Use in Spectral analysis, Feature extraction (MFCCs),
When to Use DCT
MFCC Pipeline Showing Both
Transforms
This Combination Works So Well
This Combination Works So Well
They Answer Different Questions
Step 2: Frequency Domain
Transformation
Step 2: Frequency Domain
Transformation
Role of DCT in MFCC
Role of DCT in MFCC
Role of DCT in MFCC
Visual Example of DCT
Compression
DCT Over DFT
DCT Over DFT
The Complete MFCC Feature
Vector
The Complete MFCC Feature
Vector
Practical Example
The DCT-MFCC Relationship
The DCT-MFCC Relationship
MFCC Example Problem
MFCC Example Problem
MFCC Example Problem
Step-by-Step Solution
Step-by-Step Solution
Step-by-Step Solution
Step-by-Step Solution
n (2n+1) Angle (rad) cos(angle) X[n] Product
0 1 π×1/12 = 0.2618 cos(0.2618) = 0.9659 2.1 2.028

1 3 π×3/12 = 0.7854 cos(0.7854) = 0.7071 3.4 2.404

2 5 π×5/12 = 1.3090 cos(1.3090) = 0.2588 4.2 1.087

3 7 π×7/12 = 1.8326 cos(1.8326) = - 3.8 -0.983


0.2588
4 9 π×9/12 = 2.3562 cos(2.3562) = - 2.9 -2.051
0.7071
5 11 π×11/12 = 2.8798 cos(2.8798) = - 1.5 -1.449
0.9659
Step-by-Step Solution
Step-by-Step Solution
Step-by-Step Solution
n (2n+1) Angle (rad) cos(angle) X[n] Product
0 1 2π×1/12 = 0.5236 cos(0.5236) = 0.8660 2.1 1.819
1 3 2π×3/12 = 1.5708 cos(1.5708) = 0.0000 3.4 0.000
2 5 2π×5/12 = 2.6180 cos(2.6180) = - 4.2 -3.637
0.8660
3 7 2π×7/12 = 3.6652 cos(3.6652) = - 3.8 -3.291
0.8660
4 9 2π×9/12 = 4.7124 cos(4.7124) = 0.0000 2.9 0.000

5 11 2π×11/12 = 5.7596 cos(5.7596) = 0.8660 1.5 1.299


Step-by-Step Solution
Linear prediction cepstral
coefficients (LPCC)
Linear prediction cepstral
coefficients (LPCC)
Linear prediction cepstral
coefficients (LPCC)
Linear prediction cepstral
coefficients (LPCC)
Linear prediction cepstral
coefficients (LPCC)
Linear prediction cepstral
coefficients (LPCC)
Detailed Example Problem
Detailed Example Problem
Detailed Example Problem
Detailed Example Problem
Detailed Example Problem
Detailed Example Problem
LPCC vs MFCC: Key Differences
Aspect LPCC MFCC

Basis Linear Prediction model Filter bank analysis

Domain Time-domain modeling Frequency-domain analysis


Model Type All-pole model No explicit model
Computational Lower (Levinson-Durbin) Higher (FFT + Filter banks)
Load
Formant Modeling Excellent Good
Noise Robustness Less robust More robust
Pitch Sensitivity More sensitive Less sensitive

Common Speech coding, synthesis Speech recognition


Applications
Advantages of LPCC
Limitations of LPCC
Choosing LPC Order (p)
Applications
Applications
Applications
Gammatone Frequency Cepstral
Coefficients (GFCC)
The Biological Inspiration: The
Cochlea
GFCC Computation Pipeline
GFCC Computation Pipeline
GFCC Computation Pipeline
GFCC Computation Pipeline
Complete GFCC Algorithm
Complete GFCC Algorithm
Complete GFCC Algorithm
Complete GFCC Algorithm
Complete GFCC Algorithm
GFCC Performs Better in Noise
GFCC Performs Better in Noise
GFCC Performs Better in Noise
Applications
Computational Complexity
GFCC Advantages
GFCC Advantages
GFCC Computation Pipeline
Filter Bank Differences
Aspect MFCC (Mel-filter GFCC (Gammatone
Bank) Filter Bank)
Filter Shape Triangular Gammatone
(asymmetric, rounded)
Biological Rough approximation Detailed cochlear model
Basis
Temporal Poor Excellent (models
Resolution impulse response)
Frequency Uniform on Mel-scale ERB-scale (more
Resolution accurate)
Phase Ignored Partially preserved
Information

You might also like