Chapter 18

Video over IP

Chapter Outline

Video over IP is becoming increasingly common. Ever larger numbers of people are switching from cable or satellite TV to IPTV, downloading programming over their internet connection. This can allow them to select what they want to watch, and avoid monthly cable or satellite fees. Websites such as YouTube provide both uploading and downloading of video content to millions of users. Even pornography is becoming an increasingly web-hosted video business.

18.1 Basics of Internet Protocol (IP)

Internet protocol is a communication system. Unlike the telephone system, it does not require a connection between the sender and receiver. Instead, the information is broken up into packets, and each packet finds its own path over the IP network from source to destination. These networks can be public such as the internet, or private such as a corporate network. The source, destination and every node in the network has an address, which is 32 bits, or eight hexadecimal bits. When expressed decimally, it is in the familiar form of xxx.xxx.xx.xx, or 10 decimal digits. The packets have two major components: the header and the data. The IP header is 20 bytes, and the data is a variable length up to 65615 bytes.

IP addresses can be constant, or temporary. This is best illustrated by example: an organization or corporation may have a unique, constant IP address, used to communicate over the public IP network. However, within the corporation there’s a private IP network not intended to be accessible to outsiders, which may have hundreds or thousands of different computers or devices with IP addresses. Using dynamic host reconfiguration protocol (DHCP), a router within the corporation can assign temporary IP addresses to these nodes on the private network. Certain ranges of IP addresses are designated for use by DHCP. These addresses are assigned whenever a computer tries to connect to the private network. The dynamically assigned addresses need only to be unique within the private network – other private networks can use the same range of addresses within their own private network. A function in the router bridging the private network to the external network, called network address translation (NAT), is used to translate between the address space of the public internet and the DHCP-assigned addresses within the private network.

Routers are the key components that allow anyone to transmit data to anyone else over the internet. Routers are distributed throughout the IP networks – they examine the headers, and, using the destination address, forward the packets towards their destinations. Since the packet may pass though many routers, each router must decide how best to forward the packet to get it closer to its destination. This is done by maintaining large routing tables, and monitoring the status of various network connections for traffic levels, and sometimes by determining priority for a given packet.

image

Figure 18.1 IP packet formatting.

18.2 Encapsulation

The video data may be continuously streaming or form a single large file. However it is formatted, it must be broken up into chunks to fit into IP packets, a process called encapsulation. A simple method could use the largest possible data size for each IP packet. However this has several disadvantages. Using very large IP packets increases latency, because the packet cannot be sent until enough data is made available by the source. If a packet gets corrupted over the network, a large and noticeable amount of data is lost. Also, large packets may be fragmented, meaning they may be broken into multiple packets during transmission so that they can be used by some networks – for example, standard Ethernet only allows transmission of up to 1500 bytes in packet.

Very short packets, on the other hand, are inefficient. Since the packet header is a constant size, it uses a larger portion of the total packet size. More packets also put more load on the routers in the network to process and route, as each packet is treated individually. Sensitivity to latency as well as the packet error rates on a given network can be used to determine packet data lengths.

18.3 Video Streams

Compressed video uses several types of streams for transport over a network. The simplest is the elementary stream, which contains just the compressed data output from the video encoder and does not contain audio or synchronization data.

A program stream contains several elementary streams, for video, audio or data. It contains everything needed for a given program to be presented. The data might be for on-screen text message overlay, or it might be used for production and recording functions. Time stamps are added to each of the elementary streams to synchronize them. These enhanced elementary streams are called packetized elementary streams (PES) because the elementary stream has been broken into packets, each associated with a different timestamp. The PES should not be confused with IP packets. PES is associated with how video data is packaged to maintain synchronization in a specific protocol used for compressed video, and has nothing to do with what method or protocols are used to move data, of which the internet is just one of several methods.

There are two time stamps: one is known as the presentation timestamp, indicating when each video packet should be displayed in the video; the other is the decode timestamp, indicating the order that it should be processed by the decoder. As discussed in previous chapters, the order of decode is different from the order of display in a typical GOP, with a mixture of I-, P- and B-frames.

Program streams can contain multiple video displays, supporting picture in picture, or multiple video sources – such as an anchor person on one part of the screen and a remote video feed in another. Another example is sports coverage, where multiple angles might be shown simultaneously. Program streams are usually used when little or no data loss is expected in the transmission, and typically use long packets. Applications of program streams include DVD players or within a production studio or trailer.

The transport stream is often used when the video is transmitted over long distances, over different types of multi-user networks. This could be satellite links, broadcast terrestrial links or video over IP. Transport streams feature error correction schemes, such as Reed Solomon. To facilitate error correction, fixed-length packets of 188 bytes are used. Accounting for additional bytes in error correction, this can increase to 204 or 208 bytes. Transport streams can contain multiple elementary streams, and each is identified with a packet identifier (PID). PIDs are used to differentiate between the video, audio and data elementary streams used to make up a complete program. To keep track of the PIDs, two further structures are used. One is the program association table (PAT), which is sent to provide the index of all the programs in the transport stream (this is used if PID = 0). The other structure is multiple program map tables (PAT), one for each program, giving the PID numbers for the video stream(s), audio stream(s) and data stream(s) in that program.

The transport-stream packets are much smaller than the allowable data-packet lengths used in IP. Even restricting the IP packet length to 1500 bytes to prevent fragmentation on Ethernet, there can be seven of the 208 byte long transport stream packets encapsulated.

18.4 Transport Protocols

Video streams are sent over IP networks. IP networks have their own protocols, which are independent of what type of data or application the IP network is being used for. These protocols are not specific to video over IP, but are used for nearly all internet applications.

The most common protocol used is transmission control protocol (TCP), also commonly referred to as transmission control protocol over internet protocol (TCP/IP). TCP establishes a reliable connection between source and destination using defined handshaking schemes. It keeps track of all IP packets by assigning a sequence identifier. TCP can detect if a packet is missing at the destination, and keep packets in order, even if some packets get delayed across the network. If a packet is corrupted, or fails to arrive, the destination can request a retransmission from the source.

This is extremely valuable for many types of service using the internet. For example, when sending an email with a file attachment, every byte must arrive and be reassembled in the correct sequence in order to be useful. Commerce over the internet demands error-free communication.

However, this protocol is ill-suited for some types of service, including streaming video and audio: if an IP packet is corrupted or substantially delayed, it makes little sense to request a retransmission, as the data is needed in a timely manner or not at all. Requesting retransmissions needlessly uses up more network resources and bandwidth. Also, TCP has provisions to reduce data rates if there are too many corrupted packets-known as flow control-and reducing the data rate can be very disruptive to a streaming application. For example, in a voice over IP phone call, having a few clicks due to lost data is preferable to pauses of silence while waiting for all the audio data to arrive correctly. TCP also incurs more latency or delay due to its error-free connection features.

A simpler alternative protocol is user datagram protocol (UDP). There is no handshaking between source and destination; the data is just sent. There are no attempts at tracking missing IP packets or retransmitting, or flow control. It is low latency, simple and low overhead. It is more suited for streaming data applications. If the IP packets contain data that has its own error correction built in, like transmit streams, then corrupted data may be corrected.

An extension of UDP is real-time protocol (RTP). RTP is specifically designed for real-time data-streaming applications which cannot tolerate interruption in data flow, and need minimum latency. RTP does provide some additional features compared to UDP: a time-stamping feature to allow multiple streams from a given source to be synchronized, such as video and audio; multi-casting support, so one source can send the same data to many destinations simultaneously; and packet sequencing, so lost packets can be detected, which can allow a video decoder, for example, to use previous or nearby video data as a best guess for the lost data. Note the actual protocol used with RTP is real-time streaming protocol (RTSP).

18.5 IP Transport

Video over IP can be physically transmitted using many methods, some of which are listed below.

Ethernet is familiar to most of us, as it’s commonly used within buildings or campuses and is known as a local area network (LAN). Ethernet uses another address, appended to the packet, known as the media access control (MAC) address – this is a permanent address assigned by the manufacturer of that equipment. The MAC address is a 12-hexadecimal digit address xx:xx:xx:xx:xx:xx. MAC address ranges are assigned by an international agency to manufacturers of products containing Ethernet ports. For transport within an Ethernet LAN the MAC address, appended to the IP packet, is used. LANs are built up of Ethernet equipment (computers for example), Ethernet hubs, Ethernet switches and Ethernet bridges. Hubs simply act as repeaters as any packet coming in on one port will be sent out on all the other ports. Switches examine MAC addresses, and only forward packets on ports that connect to the MAC addresses for those packets. Bridges connect different LANs together. These could be all Ethernet based LANs, or could be wireless LANs using 802.11 based wireless technology. Common Ethernet speeds are 100 BT (100 Mbps) or gigabit Ethernet (1 Gbps). Faster Ethernet speeds, such as 10 Gbps, are possible on specially designed backplanes or over fiber interfaces.

There is a wider range of technologies used for wide area networking (WAN), which, as the name implies, is over distances ranging from a few miles to thousands of miles. These are usually owned by a carrier, and used as shared resources (the internet for example), or connections can be leased and used privately (connections between corporate offices in different locations, for example). However, IP is still used as the protocol over the different transport services and technologies.

Synchronous Optical Network (SONET) and Synchronous Digital Hierarchy (SDH) are two popular standards used for transmission over fibers, and form the basis of most long-distance telecommunications. The speeds used can be above 10 Gbps. Asynchronous Transfer Mode (ATM) is a protocol used in these networks. ATM operates by allocating a specific amount of bandwidth to a given connection using virtual circuits. It allows for much finer control over the data bandwidth allocated for a given connection or user. However, due to the “guaranteed” bandwidth allocation, ATM tends to be an expensive way to communicate when bandwidth requirements are dynamic.

Fortunately, IP can be layered over ATM, and the user will not even be aware of the ATM protocol running underneath. Cable and digital subscriber lines (DSL) are used for intermediate length connections, typically from IP service providers to homes and small businesses. This is often referred to as “last mile”. Rather than using fiber, the physical connections are made with coaxial or twist pair wiring.

image

Figure 18.2 Aggregated access to internet.

These lower-speed connections are aggregated into a single higher-speed connection by the service provider, at what is called a central office (though this function could just be an equipment cabinet).

The common technology used across WANs, last mile and LANs is IP and its protocols such as TCP/IP, UDP, and RTP. In particular, compressed video over IP can be streamed across networks made up of all these technologies.

18.6 Video Over Internet Issues

The internet is a remarkable network for all sorts of communication, but it does have serious limitations for video transmission. The available raw bandwidth is large, but as there is a high variability in number of users and demand from those users, so available bandwidth for a given user is never guaranteed. Packet losses also occur, perhaps as much as 1%. More problematic is the amount of jitter, or variation on the latency of transmission from packet to packet. A packet that arrives late is the same as a packet that never arrives in a real-time system.

These issues are slightly mitigated by the latest video compression algorithms, which reduce video bandwidth, particularly on low-resolution video delivery. Many program sources of video are available on the internet.

Dedicated non-internet transmission is needed for high-definition video with good quality. This is why the great majority of consumers subscribe to a cable or satellite service, with high quality, dedicated transmission networks of video content.

Movie downloads can be delivered efficiently through the internet: companies like Netflix are moving away from DVDs by mail, towards internet downloads of compressed movie content. One advantage is that the movie can be downloaded in non-real-time, mitigating the issues of streaming video over IP. This is known as download and play. Progressive download and play is somewhere in the middle: the video is broken up into segments, and played as soon as one segment is finished downloading. As long as the next segment can finish downloading prior to the completion of the previous segments play time, there is no interruption.

There are many video applications besides movies. Download and play of short YouTube video clips (often low resolution), marketing webcasts, investor and analyst briefing or private corporate broadcast of executive’s speeches are all applications that can be well suited to video over IP transport.

18.7 Video Streaming

Video streaming requires real-time performance of the IP network to deliver the compressed video content. Usually RTP is used over the IP network.

The video source can be a video streaming server, a computer or a webcam. The video server typically has a large storage capability to house the large amounts of video to be broadcast on demand. Alternatively, the video may be live (or almost live), where it is being recorded by a camera, formatted, compressed and then sent out for viewing across the internet (perhaps a webcam). The formatting of the video can be of multiple forms, different video players being supported with one or more of these formats.

The software to do the video formatting and playing is available from several companies, and each has its proprietary methods. Some of the familiar names are: Windows Media Player by Microsoft; QuickTime by Apple; RealPlayer® by RealNetworks and Adobe.

Newer video-streaming standards such as HTTP Live Streaming from Apple have been developed to support video streaming to iPhones and other smart mobile devices. This standard uses HTTP (Hypertext Transfer Protocol) IP technology as opposed to RTSP, which can allow it to bypass many firewalls in IP networks. Microsoft offers Smooth Streaming, which also dispenses with RTSP in favor of HTTP IP technology.

HTTP is not designed for video streaming, but it has been found to be very efficient. It was originally designed for file transfer, and not to maintain a persistent connection. More recently, a keep alive mechanism was introduced, where a connection could be reused for more than one request. Using a persistent connection reduces the latency, because the TCP connection does not need to be to re-negotiated after the first request has been sent.

image

Figure 18.3 IP streaming technology development.

The progression of products enabling streaming video and audio is summarized in Figure 18.3.

Most of the discussion to this point has been about unicasting, or sending video from one source to one destination. However, many applications are multicast, such as live events, broadcast style IP TV or security system and traffic cameras.

18.8 Multicast Video

Multicast could be accomplished by running many parallel unicasts, assuming the video server has enough aggregate bandwidth to support many video streams in parallel. Even if the bandwidth did exist, it would be very inefficient. Instead, multi-tasking is primarily supported in the router. The router is required to recognize packets being multicast, replicate them, and send to multiple destinations or addresses. This is not to be confused with IP broadcasting, where a single packet is sent to all devices on the local network.

Multicasting is generally not available on the public internet but it can be enabled on private or corporate networks. It places a large burden on the router however, and this should be considered, given the router is generally supporting a lot of other traffic. The router is also responsible for detecting and processing requests to add new ports or drop ports as users either request to watch, or drop off, the program.

Unlike unicasting, the user has no control over the delivery of a multi-cast. When connecting they will join at whatever point the multicast program happens to be, and each viewer sees the same content simultaneously, similar to broadcast television. These restrictions have to be set against the significant advantages of network traffic. Just as the video bandwidth from the video server is equal to the unicast video bandwidth, so the video bandwidth is equal between various routers. If there are multiple viewers watching from a downstream router, then only the unicast bandwidth is required. Only if viewers are connected to different routers does the video stream need to be replicated at different parts of the network.

Multicast addressing is set up using session announcement protocol (SAP). The SAP informs all multicast-enabled receivers of the programs being multicast broadcast on the network. The details of either connecting or disconnecting from a multicast are covered in the Internet Group Management Protocol (IGMP). Using IGMP, the routers need to keep track of all users in their downstream path, to know whether or not the multicast program is to continue to be broadcast on a given port. If a new user requests to join, the appropriate router must replicate the program in that portion of the network if it is not already being broadcast. The routers will also broadcast the SAP messages.

18.9 Video Conferencing

Video conferencing that feels like everyone is in the same room has been a business goal for a long time. Video conferencing can be done over IP, using private networks, but latency must be carefully controlled so the interactions and conversations between people are not delayed.

The H.320 standard defines video conferencing over switched telephone lines (ISDN), and is used in much early corporate video-conferencing equipment, usually in dedicated conference rooms at different sites within a company. H.323 evolved from H.320, allowing video (and audio) conferencing to be packet based. It is a full-featured system, with considerable maturity. A number of vendors offer systems based upon H.323, and generally equipment from different vendors will interoperate. Most systems support only audio, as audio or voice conferencing is far more ubiquitous than video conferencing. The video conferencing options of H.323 feature video compression using H.264.

Another standard known as session initiation protocol (SIP) has become very popular for packet-based audio and video conferencing. SIP focuses on the messaging needed to set up, maintain and tear-down joint sessions between multiple users. The actual transport of the audio, video and data is over RTP. SIP also has additional features, such as instant messaging, and is generally more flexible than H.323.

Despite efforts to encourage the adoption of video conferencing as a primary means of holding long distance meetings, audio conferencing remains far more prevalent. However, live sharing of joint data or presentations in conference calls using technology like WebEx™ or Office Communicator has become commonplace in the business world, leveraging the capabilities of SIP.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset