Secure Real Time Protocol (SRTP)

Described in RFC 3711, provides confidentiality, message authentication, and replay protection for RTP and RTCP traffic.

Sender behavior	Receiver behavior
Determine cryptographic context to use	Read the SRTP packet from the socket.
Derive session keys from master key (via MIKEY)	Determine the cryptographic context to be used
	Determine the session keys from master key (via MIKEY)
	If message authentication and replay protection are provided, check for possible replay and verify the authentication tag
Encrypt the RTP payload	Decrypt the Encrypted Portion of the packet
If message authentication required, compute authentication tag and append	If present, remove authentication tag
Send the SRTP packet to the socket	Pass the RTP packet up the stack

In 2003, Israel M. Abad Caballero, Secure Mobile Voice over IP, M.Sc. Thesis [Caballero 2003]

AES CM (Rijndael) or Null Cipher for encryption (using libcrypto)
HMAC or, Null authenticator for message authentication
SRTP packet is 176 bytes (RTP + 4 for the authentication tag if message authentication is to be provided)
Packet creation: RTP 3-5 µs; RTP+SRTP 76-80 µs (throughput 20Mbps)
and ~1% of the time there are packets which take as long as 240 µs

Slide Notes

M. Baugher, D. McGrew, M. Naslund, E. Carrara, K. Norrman, "The Secure Real-time Transport Protocol (SRTP)", IETF RFC 3711, March 2004, Updated by RFC 5506 ftp://ftp.rfc-editor.org/in-notes/rfc3711.txt

I. Johansson and M. Westerlund, “Support for Reduced-Size Real-Time Transport Control Protocol (RTCP): Opportunities and Consequences”, Internet Request for Comments, RFC Editor, RFC 5506 (Proposed Standard), ISSN 2070-1721, April 2009 http://www.rfc-editor.org/rfc/rfc5506.txt

M. Baugher, D. McGrew, M. Naslund. E. Carrara, and K. Norrman, The Secure Real-time Transport Protocol (SRTP), Internet Request for Comments, ISSN 2070-1721, RFC 3711, RFC Editor, March 2004, Updated by RFC 5506 , http://www.rfc-editor.org/rfc/rfc3711.txt

Israel M. Abad Caballero, Secure Mobile Voice over IP, M.Sc. Thesis, Royal Institute of Technology (KTH), Dept. of Microelectronics and Information Technology, Stockholm, Sweden, June 2003. https://urn.kb.se/resolve?urn=urn%3Anbn%3Ase%3Akth%3Adiva-93113

Transcript

[slide363] And Elisabetta Carrara, who was one of my former doctoral students who did a master's thesis called "Secure Mobile Voice over IP". She introduced a protocol called Secure RTP. And Elisabetta was extremely clever. Because one of the problems, because she worked for Ericsson, they were very concerned about overhead. We already saw we had this huge overhead for a small amount of content. If we were to provide additional security, there was a risk, oops, we could have even more overhead. So the problem is, how can you manage to provide improved security that does things like provides authentication, provides encryption, avoids undetected modifications, avoids replays, without adding overhead to your packet? And she very cleverly said, hmm, I've already got a sequence number, right, can I exploit the fact that I already have sequence numbers? Because now I can use those sequence numbers as part of my process to be able to detect replays. What I need to do is only add four bytes for the authentication tag if we do message authentication. So we do have to add some data there so we can make sure it hasn't been tampered with. But all of the rest can be leveraged off the things we already have in the RTP header. And we'll see how this works. But the cool thing is, you can have very high performance. So there's basically no reason not to secure your media. And the approach is, you use a cryptographic context, you derive a session key from a key that got exchanged by another protocol called MIKEY that she also introduced, you encrypt the RTP payload, if you have authentication, you add the authentication tag, and now you simply send the packet out the socket. The receiver does the reverse. They get the packet. If there's authentication, they check it. If it's not authentic, they get rid of it. Otherwise, they decrypt it. And now they use the sequence number to decide, hey, have I already seen that packet? If I have, what do I do? Throw it away, because it's a replay. And off you go. So it's incredibly simple. But it was a very important observation that she made, that I could do this and have very low additional overhead.