Dealing with Delay jitter

Unless packets are lost, if we wait long enough they will come, but then the total delay may exceed the threshold required for interactive speech! (~180ms)

Transcript

[slide82] So, ideally, what do we do? We set adaptively the delay that our de-jitter buffer is going to have, so that on average, we get all of the packets in time to play them. So, how big do you think the de-jitter buffer is? Let's say we're using 20 millisecond audio frames. It should be just enough that we catch most of the packets in time. Without imposing more delay than we need to have. So, if it's 20 milliseconds, we might have a delay that we purposely impose of, let's say, 30 milliseconds. It's one and a half packets, or perhaps slightly longer, two packets worth, maybe even three packets, but it's not very big. Why don't we want the delay to be very big? Because delay is additive. Once we put delay into the system, we don't have an easy way of removing the delay. And we saw that from an interactive point of view, we needed to bound that delay to something in the neighborhood of about 180 milliseconds. But that 180 milliseconds has to include the time for coding and decoding. The time for transmitting it, the time for packetization, the time for the buffering and playout. So, we only have a budget to work in. So, we don't want to make it too big.