Skip to content

ag027592/ml-switchboard-affect

 
 

Repository files navigation

Switchboard Affect (SWB-Affect)

The Switchboard Affect dataset contains perceptual emotion annotations for 10,000 publicly available audio segments. This amounts to 25 hours of speech. The source audio files can be acquired from LDC (https://catalog.ldc.upenn.edu/LDC97S62) and we include a script to extract audio segments from the corpus.

Annotation Descriptions

Each segment was annotated independently by 6 graders who passed a training and certification process. The set of labels includes both categorical and dimensional emotions, and we provide a detailed format (annotations from each grader) as well as a consensus format (annotations aggregated for each segment).

Emotion Labels

  • Categorical emotions.

    Selections of primary and secondary emotions from the following set:

    Anger Contempt Disgust
    Sadness Fear Surprise
    Happiness Tenderness Calmness
    Neutral Other
  • Dimensional emotions.

    Ratings for valence, activation, and dominance ranging from 1 to 5:

    1 5
    Valence negative positive
    Activation drained energetic
    Dominance weak strong

Data Files

  • labels_detailed.csv includes annotations from each annotator, unaggregated.
  • labels_consensus.csv includes consensus annotations aggregated for each segment. For consensus on categorical emotions, 50%+ of graders need to agree on a primary or secondary emotion. For consensus on dimensional emotions, we take the mean of ratings by all annotators.

Segment Extraction Script

The script extract_segments.py reads in the raw audio and metadata from the LDC corpus and saves .wav files for each segment. [LDC_DIR] refers to the folder that contains raw audio files (in subfolder swb1_LDC97S62) and segment metadata (in subfolder ms98_transcriptions). [SEG_DIR] refers to the folder in which you want to save the segments.

To extract the segments, run the following from this directory

python3 extract_segments.py --ldc_dir [LDC_DIR] --seg_dir [SEG_DIR] 

Citation

If you find the SWB-Affect dataset or this code useful in your research, please cite the following paper:

@misc{romana2025,
    author       = {Amrit Romana AND Jaya Narain AND Tien Dung Tran AND Andrea Davis AND Jason Fong AND Ramya Rasipuram AND Vikramjit Mitra},
    title        = {Switchboard-Affect: Emotion Perception Labels from Conversational Speech},
    howpublished = {ACII 2025},
}

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%