Another advantage of SIP is that it separates session and media negotiation, enabling huge flexibility in terms of the payload supported. This separation means that the two data streams can be encrypted separately from one another. You can encrypt SIPS via the TLS protocol (similarly to HTTP), also called SIPS, and encrypt the media stream (voice data) via the SRTP protocol too.
To guarantee secure encryption, both data streams (so session and media) must be encrypted at the same time. Symmetrical encryption methods are used in the interests of performance and the resources required. To do so, the media stream’s symmetrical keys are exchanged via the SDP (session description protocol) in the SIP signalling and would therefore be vulnerable to attack via an unencrypted SIP.
The TLS’s equally symmetrical keys are also replaced at the beginning of the session, but in this case the SSL certificates also take action so that the symmetrical keys are securely encrypted and replaced with the SSL certificates’ asymmetric keys.
TLS and SRTP involve encryption between two SIP points. When SIP terminal equipment 1 communicates with SBC – softswitch – SBC – SIP terminal equipment 2, these are already four separate sections, each of which require encryption. Therefore, encryption is often only carried out between terminal equipment and SBC.
Measurements were carried out in an experimental setup (The Impact of TLS on SIP Server Performance, 2010), which showed that using TLS can reduce the performance of a software-based SIP softswitch by a factor of up to 17 compared to traditional SIP-over-UDP.