Voice Quality Tests
Voice Quality Tests
Mean Opinion Score (MOS)- defined in ITU-T P.800
- ITU test based on using 40 or more people from different ethnic or language backgrounds listening to audio samples of several seconds each
- Human listeners rating the quality from 1 to 5; 5 being perfect, 4 “toll-quality”, …
Perceptual Speech Quality Measurement (PSQM) - ITU-T P.861
- A computer algorithm - so it is easy to automate
- scale of 0 to 6.5, with 0 being perfect
- Designed for testing codecs
- test tools from JDSU VQT, QEmpirix, Finisar, … - cost US$50k and up
PSQM+
- Developed by Opticom
- for VoIP testing
PESQ (Perceptual Evaluation of Speech Quality)
- submitted to ITU-T by Psytechnics, Opticom, and SwissQual
- 0.95 correlation with human listeners
- ITU-T P.862 standard Dec. 2003
Slide Notes
CCITT Recommendation P.800, Methods for Subjective Determination of Transmission Quality, specifically Section 7: Subjective Opinion Tests, paragraph 3.1.2.3 Silence (gap) characteristics, CCITT, 1988. http://starlet.deltatel.ru/ccitt/1988/ascii/5_1_06.txt Links to an external site.
ITU-T, Methods for Subjective Determination of Transmission Quality},
ITU-T, Recommendation P.800, March 1993
JDSU (formerly Agilent) Voice Quality Tester (VQT) J1981B http://www.jdsu.com/ Links to an external site.
Transcript
[slide466] However, as soon as we talk about quality of service, the next problem becomes, how do we measure? Well, mean opinion scores typically requires to have 40 or more people to listen to it and rate the quality of the call. And to me, this has to be one of the worst jobs in life, is being a scorer for this. Going all day long, listening to the little snippets of conversation in the different languages that you speak, and pushing one of five buttons. But somebody has to do it. Or at least that was the model previously. And then people started saying, hey, computers can do that. So perceptual speech quality measurement came out in ITU-P.861. And now the idea is we have a computer rate it, and then we compare that with the human ratings. And if we get a good correlation, we say, yeah, what a great idea. There are companies that make tools for that. They're pretty expensive. Then there's PSQM+, developed by Opticon. And then PESQ, Perceptual Evaluation of Speech Quality. It has been shown to have about a 0.9 correlation with human listeners. So it's very comparable to human listeners. And it's been a standard since December 2003.