RItsumeikan Shout Corpus (RISC)
The RItsumeikan Shout Corpus (RISC) contains wide variety types of shouted speech samples collected in recording experiments. Each shouted speech sample in RISC has a shout type and is also assigned shout intensity ratings via a crowdsourcing service. RISC supports two kinds of shout recognition tasks: shout type classification and shout intensity prediction.
Contents
- Normal and shouted speech
- Shout intensity ratings
RISC contains speech samples of 50 speakers (21 female and 29 male) uttering 50 sentences in two different utterance styles: normal and shouting. In the sentence list [CSV format, UTF-8 w/o BOM], each sentence written in Japanese is converted into its English phoneme representation following the conversion rules of the speech segmentation toolkit in the speech recognition engine Julius.
Each shouted speech sample is assigned shout intensity ratings given by ten listeners, ranging in value from 1 to 7. In summary, RISC contains 2,500 shouted speech samples with shout intensity ratings and 2,500 normal speech samples.
File format
RISC consists of the following two types of files:
- Speech files: [speaker index]_[speech type]_[sentence index].wav
- 01–05: vowel sentences
- 06–10: sentences that are difficult to classify as typical of hazardous or less hazardous situations
- 11–30: sentences specific to less hazardous situations
- 31–50: sentences specific to highly hazardous situations
- Shout intensity file: shout_intensity_ratings.csv
The rules for naming speech data are as follows:
[speaker index]
‘f’ and ‘m’ in the speaker indexes indicate female and male speakers, respectively.
[speech type]
‘n’ and ‘s’ indicate normal and shouted speech, respectively.
[sentence index]
The meaning of each sentence index is as follows:
The first column of this CSV file contains the name of each speech file, and columns 2 through 11 contain the shout intensity ratings for the corresponding speech file as rated by ten listeners.
Example) f1_s_01.wav,2,3,3,3,3,2,2,3,1,1
Directory structure
The directory structure of RISC is as follows:
RISC
|
|--- shout_intensity_ratings.csv
|
|--- speech
|
|-- normal
| |
| |-- f1
| | |-- f1_n_01.wav
| | |-- f1_n_02.wav
| | |-- f1_n_03.wav
| | |- .
| | |- .
| | |- .
| | |-- f1_n_50.wav
| |
| |-- f2
| |-- f3
| |-- f4
| |- .
| |- .
| |- .
| |-- f21
| |-- m1
| | |-- m1_n_01.wav
| |- .
| |- .
| |- .
| |-- m29
|
|--- shout
| |
| |-- f1
| | |-- f1_s_01.wav
| | |-- f1_s_02.wav
. . . .
. . . .
. . . .
Terms of use
RISC may be used for
- Research by academic institutions
- Noncommercial research, including research conducted within commercial organizations
- Personal use
Download
RISC can be downloaded HERE. [Zip format, 296 MB]
Contributers
- Takahiro Fukumori (Ritsumeikan University, affiliation at the time), Main Contributor
- Taito Ishida (Ritsumeikan University, affiliation at the time)
- Yoichi Yamashita (Ritsumeikan University)
Citation
Takahiro Fukumori, Taito Ishida, and Yoichi Yamashita, ``RISC: A Corpus for Shout Type Classification and Shout Intensity Prediction,'' IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 4434-4444, DOI: 10.1109/TASLP.2024.3473302, 2024.
Acknowledgment
This work was supported by JSPS KAKENHI Grant Number JP21K14381.
Contact
- Takahiro Fukumori
- Email: takahiro.fukumori (at) ieee.org
Last Updated: 2025/4/1