Voice Quality Tests

Mean Opinion Score (MOS)- defined in ITU-T P.800

ITU test based on using 40 or more people from different ethnic or language backgrounds listening to audio samples of several seconds each
Human listeners rating the quality from 1 to 5; 5 being perfect, 4 “toll-quality”, …

Perceptual Speech Quality Measurement (PSQM) - ITU-T P.861

A computer algorithm - so it is easy to automate
scale of 0 to 6.5, with 0 being perfect
Designed for testing codecs
test tools from JDSU VQT, QEmpirix, Finisar, … - cost US$50k and up

PSQM+

Developed by Opticom
for VoIP testing

PESQ (Perceptual Evaluation of Speech Quality)

submitted to ITU-T by Psytechnics, Opticom, and SwissQual
0.95 correlation with human listeners
ITU-T P.862 standard Dec. 2003

Slide Notes

CCITT Recommendation P.800, Methods for Subjective Determination of Transmission Quality, specifically Section 7: Subjective Opinion Tests, paragraph 3.1.2.3 Silence (gap) characteristics, CCITT, 1988. http://starlet.deltatel.ru/ccitt/1988/ascii/5_1_06.txt

ITU-T, Methods for Subjective Determination of Transmission Quality},

ITU-T, Recommendation P.800, March 1993

JDSU (formerly Agilent) Voice Quality Tester (VQT) J1981B http://www.jdsu.com/

Transcript

[slide466] However, as soon as we talk about quality of service, the next problem becomes, how do we measure? Well, mean opinion scores typically requires to have 40 or more people to listen to it and rate the quality of the call. And to me, this has to be one of the worst jobs in life, is being a scorer for this. Going all day long, listening to the little snippets of conversation in the different languages that you speak, and pushing one of five buttons. But somebody has to do it. Or at least that was the model previously. And then people started saying, hey, computers can do that. So perceptual speech quality measurement came out in ITU-P.861. And now the idea is we have a computer rate it, and then we compare that with the human ratings. And if we get a good correlation, we say, yeah, what a great idea. There are companies that make tools for that. They're pretty expensive. Then there's PSQM+, developed by Opticon. And then PESQ, Perceptual Evaluation of Speech Quality. It has been shown to have about a 0.9 correlation with human listeners. So it's very comparable to human listeners. And it's been a standard since December 2003.