SDES (Session Description Protocol Security Descriptions) for Media Streams is a way to negotiate the key for Secure Real-time Transport Protocol. It has been proposed for standardization to the IETF in July 2006 (see .)
The keys are transported in the SDP attachment of a SIP message. That means, the SIP transport layer must make sure that no one else can see the attachment. This can be done by using TLS transport layer, or other methods like S/MIME. Using TLS assumes that the next hop in the SIP proxy chain can be trusted and it will take care about the security requirements of the request.
The main advantage of this method is that it is extremely simple. The key exchange method has been picked up by several vendors already, even though some vendors do not use a secure mechanism to transport the key. This helps to get the critical mass of implementation to make this method the de facto standard.
To illustrate this principle with an example, the phone sends a call to the proxy. By using the sips scheme, it indicates that the call must be made secure. The key is base-64 encoded in the SDP attachment.
INVITE sips:*97@ietf.org;user=phone SIP/2.0 Via: SIP/2.0/TLS 172.20.25.100:2049;branch=z9hG4bK-s5kcqq8jqjv3;rport From: "123"
The phone receives the answer from the proxy and now there can be a two-way secure call:
SIP/2.0 200 Ok Via: SIP/2.0/TLS 172.20.25.100:2049;branch=z9hG4bK-s5kcqq8jqjv3;rport=62401;received=66.31.106.96 From: "123"
A common problem with secure media is that the key exchange might not be finished when the first media packet arrives. In order to avoid initial clicks, those packets must be dropped. Usually this is only a short period of time (below 100 ms), so that this is no major problem.
The SDES method does not address the "end-to-end" media encryption. For example, if user A is talking to user B via a proxy P, SDES allows negotiation of keys between A and P or between B and P, but not between A and B. For end-to-end media security you must first establish a trust relationship with the other side. If you use a trusted intermediate for this, the call setup delay will significantly increase, which makes applications like push-to-talk difficult. If you do this peer-to-peer, it might be difficult for you to identify the other side. For example, your operator might implement a B2BUA architecture and play the role of the other side, so that you still don't have end-to-end security. Newer, modern protocols, like ZRTP, offer end-to-end encryption for SIP/RTP calls.