RTP: Real-Time Transport Protocol

RTP: Real-Time Transport Protocol

Defined in RFC 1889, now defined by RFC 3550

Designed to carry out variety of real-time data: audio and video. Provides two key facilities:

  • Sequence number for order of delivery
  • Timestamp for control playback

Provides no mechanisms to ensure timely delivery.

RTP packet format

0 1

2

3

4 5 6 7

8

9

16                                                                                                                  31

VER

P

X

CC

M

PTYPE

Sequence number

Timestamp

Synchronization source identifier

Contributing source ID

 

P whether zero padding follows the payload

X whether extension or not

M marker for beginning of each frame

PTYPE Type of payload

We will address the other fields later.


Slide Notes

[RFC 3550] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, ‘RTP: A Transport Protocol for Real-Time Applications’, Internet Request for Comments, vol. RFC 3550 (INTERNET STANDARD), Jul. 2003 [Online]. Available: http://www.rfc-editor.org/rfc/rfc3550.txt Links to an external site.


Transcript

[slide91] So how do we actually implement the RTP? Well, RTP, we said, had to include a sequence number. So we had a 16-bit sequence number field, a 32-bit timestamp, and then something called a synchronization source identifier. Why do we need to identify who we're synchronizing with, and a contributing source ID, or actually list of contributing source IDs? Well, how many of you have ever been talking on a microphone, and then moved towards the speaker, where that microphone audio is coming out? What happens? You get feedback. Right? It's feedback, not echo. You get feedback. So we need to put in the sources of whose audio it is, so that we make sure we never ever mix that audio back in. And we need to all say who we're synchronizing with. This means we can actually have a distributed orchestra, where each of the people are playing their different instrument, and now because they're synchronized to the same time source, we can take those separate streams, shift them in time, and play them out with the correct temporal positioning. But we have to agree on a time source, so we need to say which time source that is. We have a version field, we have a padding field where we have a flag to indicate whether you're padding or not, we have an extension field, we have the CC field here, which tells who the contributing sources are, how many there are here, we have a marker, and we have a field.