What's RTP?

RTP is the Real-Time Transport Protocol, an Internet standard for the transport of real-time data (such as audio or video). RTP is defined by the Audio-Video Transport Working Group (AVT Working Group) of the Internet Engineering Task Force (IETF). The IETF (http://www.ietf.org) is an open community concerned with the evolution of the Internet and part of the larger Internet Society (ISOC), a professional membership society that oversees the issues that affect the Internet.

Given RTP's pedigree, which is designed by the Audio-Video Transport Working Group of arguably the Internet's chief standards body, it shouldn't be surprising at all that Sun has adopted RTP as the mechanism for streaming media within the JMF. Thus, to write applications such as video conferencing or even a player of broadcast media in the JMF requires the use of the RTP. But what is RTP, and where does it fit in the scheme of things? Those readers wanting to skip the details and simply write streaming media applications without knowledge of the JMF can do so for a time. As with much of the JMF, the user is provided with a very abstract model that shields him from much of the detail. In that case, readers should move through to the next section concerning RTP and the JMF. However, for those doing anything significant in the area of streaming media, it is likely that the material in this section will need to be visited at some time in the future.

RTP is described by an RFC (Request for Comments) of the IETF: RFC1889 (http://www.ietf.org/rfc/rfc1889.txt). Despite the innocuous name, RTP as described by the RFC is a standard that has been in its current form since early 1996 and is thus stable. The abstract of the RFC describes RTP as follows:

RTP provides end-to-end network transport functions suitable for applications transmitting real-time data, such as audio, video, or simulation data, over multicast or unicast network services.

The introduction section of the document states:

Applications typically run RTP on top of UDP to make use of its multiplexing and checksum services; both protocols contribute parts of the transport protocol functionality.

The services provided by the RTP include the identification of content type (that is, type and format of media) within a data packet, packet numbering, packet time stamping, and the ability to synchronize media streams from different sources. These are a minimal set of services that might be expected from a protocol providing media transport. Given that data might be delayed for different intervals, be corrupted, or be lost, it is possible for data packets to arrive out of order, not arrive, or that streams synchronized at the source site (for example, captured audio and video in a video-conference) arrive out of synch at the destination. Numbering, time stamps, and identification of content type provide the means for the detection of these problems and the ability for them to be rectified. However higher-level services such as connection negotiation or quality-of-service guarantees aren't part of RTP. RTP was designed to be lean and make the minimal demands on the bandwidth over which the media is being transported. This means that such services aren't the domain of the RTP.

IP and UDP

Figure 9.1 shows the typical case involving streaming media via the RTP, and the lower-level level protocols upon which it sits. Although RTP doesn't require UDP (User Datagram Protocol), it is by far the most common protocol atop which RTP is implemented.

Figure 9.1. Most common layering of RTP atop UDP/IP to provide media streaming capabilities.


IP (Internet Protocol), which is more commonly heard as part of TCP/IP (Transmission Control Protocol atop Internet Protocol), is a low-level protocol by which most hosts on the Internet communicate with one another. It is a means by which hosts and routers ensure that data packets travel from source to destination host while hiding the details of the transmission medium. A number of protocols are built atop IP.

UDP is a lightweight communication protocol for the transportation of data packets. Inherently packet oriented, UDP is a low overhead protocol (as opposed to say TCP) because of the restricted services it provides. No guarantee is made of packet delivery; UDP provides effectively blind transmission of data. This means that packets can be lost, corrupted, or out-of-order, and one or both ends of the communication channel could be unaware of the fact. Hence, it isn't uncommon for higher-level protocols to be built atop UDP in order to capitalize on its efficiency while building in the possibility of error checking and recovery. RTP is such a protocol.

RTP and RTCP

RTP is augmented by a control protocol—RTCP (RTP Control Protocol). The purpose of RTCP is to provide information about the quality of service of an RTP connection by identifying the participants and relevant information about each. Such information is sent by each participant and includes the number of packets received (if receiving) and sent (if sending), and other timing (clock) and synchronization information. The same RFC (http://www.ietf.org/rfc/rfc1889.txt) that describes RTP also describes RTCP.

All RTP packets are composed of two parts: a fixed header and the associated payload. The header includes a payload type (type of media), sequence number (packet number within the media sequence), time stamp, synchronization source, and contributing source (where the media originated from). The header can range in size from 12 bytes (the most common case of media originating from a single source) to 72 bytes (media originating from 16 different sources). The payload is the media data itself.

RTCP packets are compound (consisting of at least two, one of which is always a Source Description), but fall into one of five different types:

Sender Report— Produced by those who have been sending packets recently. A Sender Report includes the total number of packets and bytes sent as well as synchronization (timing) information.

Receiver's Report— Produced by those who have been receiving packets recently. Participants send a Receiver's Report packet for each participant they are receiving data from. Information includes number of packets lost, highest (packet) sequence number received, and a timestamp that can be used by the sender to estimate the lag/latency between sender and receiver.

Source Description— Description of the source of the report in canonical name; also possibly other information such as e-mail addresses or physical locations.

Bye— Sent by a participant who is leaving the session. Might include the reason for leaving.

Application Specific— A means for applications to define their own messaging across RTCP.

RTP Applications

RTP applications can be divided into clients, those that passively receive, and servers, those that actively transmit. Some, such as video-conferencing software, are both clients and servers—transmitting and receiving data.

The following terminology describes RTP as used by RTP applications:

RTP Session— An association between a group of applications, all communicating via RTP. A session is identified by a network address and a pair of ports—one for the RTP packets and one for the RTCP packets. Each media type has its own session. Hence for any number of applications participating in the stereotypical video conference (involving both audio and video), it will consist of two sessions—one for audio and one for video.

RTP Participant— An application taking part in an RTP Session.

RTP Port— An integer number used to differentiate between different applications on the same machine. Many common network services have a port associated with them.

Unicast, Multi-Unicast, Broadcast, and Multicast

IP supports a number of addressing schemes: unicast, broadcast, and multicast. The type of addressing scheme is indicated by the IP address of a packet. The three modes can be used in conjunction with RTP (and the JMF).

Unicast, also known as point-to-point, is by far the most common addressing scheme in use on the Internet today, and it describes the transmission of a packet (from a source) to a single address. Figure 9.2 is a schematic of this addressing scheme. In a time-based media context, this approach would be the most sensible for a simple two-person Internet phone scenario—two people transmitting directly to each other.

Figure 9.2. Typical unicast scenario—point-to-point.


Multi-unicast is a simple expansion of unicast in that the transmitter sends duplicates of packets to a number of hosts, not just one. In multi-unicast, the packets are duplicated, so it has none of the bandwidth advantages of the multicast approach. Figure 9.3 is a schematic of this scheme. In Figure 9.3's scenario, the transmitter's data is duplicated and sent as two separate streams to the two recipients. A video-conferencing application between three or four participants might use multi-unicast: Each participant would know the address of the other members involved and transmit (audio and video streams) directly to each of those addresses.

Figure 9.3. A multi-unicast scenario with two recipients.


Broadcast describes the transmission of packets to all hosts on a particular subnet. Although it offers bandwidth savings (packets are not duplicated till necessary), it is limited by the constraint of a single subnet. Figure 9.4 is a schematic of this approach to addressing. As an example, broadcast might be used within an organization to send a video to all machines.

Figure 9.4. Typical Broadcast transmission—the data is sent to all machines on a particular subnet.


Multicast describes the most sophisticated and versatile means of addressing: one that is also of particular significance for many time-based media applications. Multicast is a receiver-centric scheme. The transmitter sends to a single address—that of a multicast session. Receivers join a session by indicating they want to listen to the address associated with a session. The network infrastructure (the routers) is then responsible for delivering data to all receivers (listeners). Figure 9.5 is a schematic of the approach. In this scenario, the transmitter sends to a multicast address. Receivers indicate that they want to listen to that address, and the network infrastructure (routers) is responsible for delivering the data.

Figure 9.5. Typical multicast transmission scenario.


Multicasting is of particular significance to applications such as multi-participant video-conferencing for at least two reasons. First, each participant isn't required to maintain an up-to-date list of other participants—a difficult task because participants come and go for various reasons. Each participant simply transmits and listens to the session address. Second, a multicast scheme means that data packets aren't duplicated until necessary, implying considerable potential bandwidth savings. Only when the route to listeners to a session diverges are the packets duplicated. This is all supported by the network.

Certain network addresses, namely those in the range 224.0.0.0 to 239.255.255.255, are assigned by IANA (Internet Assigned Numbers Authority) for multicast applications. Addresses within that range are further subdivided into various assigned purposes. For instance, the addresses from 224.2.0.0 to 224.2.127.253 (inclusive) are currently assigned for multimedia conference calls. The complete list of multicast assigned numbers can be found at http://www.iana.org/assignments/multicast-addresses.

Multicasting is a complex topic, particularly in terms of how the routing is achieved. That is further complicated by the fact that not all older routers are capable of supporting multicast packets. To this end, MBONE (the Internet Multicast Backbone) was created as a group of networks and routers that supported multicast.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset