RItsumeikan Shout Corpus (RISC)

The RItsumeikan Shout Corpus (RISC) contains wide variety types of shouted speech samples collected in recording experiments. Each shouted speech sample in RISC has a shout type and is also assigned shout intensity ratings via a crowdsourcing service. RISC supports two kinds of shout recognition tasks: shout type classification and shout intensity prediction.

Normal and shouted speech

RISC contains speech samples of 50 speakers (21 female and 29 male) uttering 50 sentences in two different utterance styles: normal and shouting. In the sentence list [CSV format, UTF-8 w/o BOM], each sentence written in Japanese is converted into its English phoneme representation following the conversion rules of the speech segmentation toolkit in the speech recognition engine Julius.

Shout intensity ratings

Each shouted speech sample is assigned shout intensity ratings given by ten listeners, ranging in value from 1 to 7. In summary, RISC contains 2,500 shouted speech samples with shout intensity ratings and 2,500 normal speech samples.

File format

RISC consists of the following two types of files:

Speech files: [speaker index]_[speech type]_[sentence index].wav

The rules for naming speech data are as follows:

[speaker index]
‘f’ and ‘m’ in the speaker indexes indicate female and male speakers, respectively.

[speech type]
‘n’ and ‘s’ indicate normal and shouted speech, respectively.

[sentence index]
The meaning of each sentence index is as follows:

01–05: vowel sentences
06–10: sentences that are difficult to classify as typical of hazardous or less hazardous situations
11–30: sentences specific to less hazardous situations
31–50: sentences specific to highly hazardous situations

Shout intensity file: shout_intensity_ratings.csv

The first column of this CSV file contains the name of each speech file, and columns 2 through 11 contain the shout intensity ratings for the corresponding speech file as rated by ten listeners.

Example) f1_s_01.wav,2,3,3,3,3,2,2,3,1,1

Directory structure

The directory structure of RISC is as follows:

			
RISC
|
|--- shout_intensity_ratings.csv
|
|--- speech
	|
	|-- normal
	|   |
	|   |-- f1
	|   |   |-- f1_n_01.wav
	|   |   |-- f1_n_02.wav
	|   |   |-- f1_n_03.wav
	|   |   |-     .
	|   |   |-     .
	|   |   |-     .
	|   |   |-- f1_n_50.wav
	|   |
	|   |-- f2
	|   |-- f3
	|   |-- f4
	|   |-   .
	|   |-   .
	|   |-   .
	|   |-- f21
	|   |-- m1
	|   |   |-- m1_n_01.wav
	|   |-   .
	|   |-   .
	|   |-   .
	|   |-- m29
	|
	|--- shout
	|    |
	|    |-- f1
	|    |   |-- f1_s_01.wav	
	|    |   |-- f1_s_02.wav	
	.    .   .      .
	.    .   .      .	
	.    .   .      .

Terms of use

RISC may be used for

Research by academic institutions
Noncommercial research, including research conducted within commercial organizations
Personal use

Download

RISC can be downloaded HERE. [Zip format, 296 MB]

Contributers

Takahiro Fukumori (Ritsumeikan University, affiliation at the time), Main Contributor
Taito Ishida (Ritsumeikan University, affiliation at the time)
Yoichi Yamashita (Ritsumeikan University)

Citation

Takahiro Fukumori, Taito Ishida, and Yoichi Yamashita, ``RISC: A Corpus for Shout Type Classification and Shout Intensity Prediction,'' IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 4434-4444, DOI: 10.1109/TASLP.2024.3473302, 2024.

Acknowledgment

This work was supported by JSPS KAKENHI Grant Number JP21K14381.

Contact

Takahiro Fukumori
Email: takahiro.fukumori (at) ieee.org

Last Updated: 2025/4/1