
Each excerpt has been analyzed in terms of acoustic, linguistic, and perceptual features. Acoustic features include fundamental frequency (F0) mean, F0 standard deviation, F0 stability, and spectral centroid. Linguistic features include a textual representation of the speech, as well as both automatically and manually counted number of syllables. From the syllable counts, the corpus additionally contains a measure of speaking rate (syllables per second). Perceptual ratings include melodic, rhythmic, and pleasantness ratings from a small (n = 5) number of internal raters who provided ratings for the entire corpus. Additionally, subsets of excerpts from the corpus have been rated by a larger number of raters (total n = 420) in the validation of the corpus.
This corpus was developed to allow researchers to investigate the incidental musicality of speech. Although speech and song are not typically confused by listeners, both are dynamic signals that vary in pitch and timing. Thus, even if a speaker is not intending to do so, occasionally the speech signal might sound more musical. We anticipate that the HuMS Corpus can be used for assessing how the musicality of speech differs as a function of speaker sex and intended audience (child-directed versus adult-directed), as well as how the musicality of speech relates to how speech is attended to and remembered. Additional uses of the corpus might include generating new "speech-to-song" stimuli, training models on differentiating speech and song, or other applications we have not considered!
