Perceived voice quality

Perceived voice quality

There are very nice studies of the effects of delay on perceived voice quality, see R. G. Cole and J. H. Rosenbluth, “Voice over IP Performance Monitoring”[Cole 2001].

Id = 0.024d + 0.11d – 177.3H(d – 177.3)

d = one-way delay in ms

if x < 0 then H(x) = 0  else H(x) = 1 when x ≥ 0

I_d in ms versus d in ms
Id in ms versus d in ms

 

The delay impairment (Id) has roughly two linear behaviors, thus for delays less than 177 ms conversation is very natural, while above this it become more strained (eventually breaking down ⇒ simplex)


Slide Notes

Also important are the measures of delay, delay jitter, throughput, packet loss, etc. IP Performance Metrics (ipp Links to an external site.m Links to an external site.) is attempting to specify how to measure and exchange information about measurements of these quantities.

[Cole 2001] R. G. Cole and J. H. Rosenbluth, ‘Voice over IP performance monitoring’, ACM SIGCOMM Computer Communication Review, vol. 31, no. 2, p. 9, Apr. 2001. DOI: 10.1145/505666.505669


Transcript

[slide84] Now, this paper by Cole and Rosenbluth, one of my favorite papers, called "Voice over IP Performance Monitoring", they basically sat down and said, well, if we ask listeners who are listening to this audio, how bad does the audio sound when we add delay? What they found is this equation, ID equals 0.024D plus 0.1 times D minus 173.3 times H times D minus 177.3. That 177.3 is the magic number, because it's the inflection point here between the two behaviors. So, what they found is initially, little bits of additional delay aren't perceived as bad quality by the user. But, if we add a lot of delay, it gets bad very fast. We can re-plot that, where if we plot the so-called mean opinion score (MOS) here, if the call were perfect, it would have a score of 5. And what they found is that the slope initially here is very small, till that 177.3, and then the quality goes down very, very fast. So, this is a plot in terms of perceived increased delay, they call it delay impairment, but you can convert it to equivalent MOS scores, which is what you would get if you ask a panel of people to say, what's the quality? This behavior is really, really useful to understand. So, there's a thesis by two students, about five years ago now, who looked at this for haptic interfaces, because we could think instead of voice being exchanged, that what we're actually sending is force feedback information, and controlling a remote machine or something like that. In this, at 20 milliseconds, that means we're getting, essentially, 1/50th. Right, we've got 50 packets per second. In a haptic control loop, there is a 1000 Hertz control loop. And the question was, what happens in the haptic system? Does it fall off slowly and then very fast, with a very short delay, or not? And if you're interested, you can go take a look at their thesis. The surprising thing is, they found it doesn't have to fall off very fast.


GQMJr Notes

Martin Olofsson, and Sebastian Öhman, Networked Haptics, Master's thesis, KTH, School of Information and Communication Technology, Communication Systems, TRITA-ICT-EX-2009:106, https://urn.kb.se/resolve?urn=urn%3Anbn%3Ase%3Akth%3Adiva-91494 Links to an external site.