4
Quality of Experience for HTTP Adaptive Streaming Services

Ozgur Oyman, Vishwanath Ramamurthi, Utsaw Kumar, Mohamed Rehan and Rana Morsi

Intel Corporation, USA

4.1 Introduction

With the introduction of smartphones like the iPhone™ and Android™-based platforms, the emergence of new tablets like the iPad™, and the continued growth of netbooks, ultrabooks, and laptops, there is an explosion of powerful mobile devices in the market which are capable of displaying high-quality video content. In addition, these devices are capable of supporting various video-streaming applications, interactive video applications like video conferencing, and can capture video for video-sharing, video-blogging, video-Twitter™, and video-broadcasting applications. Cisco predicts that mobile traffic will grow by a factor of 11 until 2018, and that this traffic will be dominated by video (so, by 2018, over 66% of the world's mobile traffic will be video).1 As a result, future wireless networks will need to be optimized for the delivery of a range of video content and video-based applications.

Yet, video communication over mobile broadband networks today is challenging due to limitations in bandwidth and difficulties in maintaining the high reliability, quality, and latency demands imposed by rich multimedia applications. Even with the migration from 3G to 4G networks – or Radio Access Networks (RANs) and backhaul upgrades to 3G networks – the demand on capacity for multimedia traffic will continue to increase. As subscribers take advantage of new multimedia content, applications, and devices, they will consume all available bandwidth and still expect the same quality of service that came with their original service plans – if not better. Such consumer demand requires exploration of new ways to optimize future wireless networks for video services toward delivering higher user capacity to serve more users and also deliver enhanced Quality of Experience (QoE) for a rich set of video applications.

One of the key video-enhancing solutions is adaptive streaming, which is an increasingly promising method to deliver video to end-users, allowing enhancements in QoE and network bandwidth efficiency. Adaptive streaming aims to optimize and adapt the video configurations over time in order to deliver the best possible quality video to the user at any given time, considering changing link or network conditions, device capabilities, and content characteristics. Adaptive streaming is especially effective in better tackling the bandwidth limitations of wireless networks, but also allows for more intelligent video streaming that is device-aware and content-aware.

Most of the expected broad adoption of adaptive streaming will be driven by new deployments over the existing web infrastructure based on the HyperText Transfer Protocol (HTTP) [1], and this kind of streaming is referred to here as HTTP Adaptive Streaming (HAS). HAS follows the pull-based streaming paradigm, rather than the traditional push-based streaming based on stateful protocols such as the Real-Time Streaming Protocol (RTSP) [2], where the server keeps track of the client state and drives the streaming. In contrast, in pull-based streaming such as HAS, the client plays the central role by carrying the intelligence that drives the video adaptation (i.e., since HTTP is a stateless protocol). Several important factors have influenced this paradigm shift from traditional push-based streaming to HTTP streaming, including: (i) broad market adoption of HTTP and TCP/IP protocols to support the majority of Internet services offered today; (ii) HTTP-based delivery avoids Network Address Translation (NAT) and firewall traversal issues; (iii) a broad deployment of HTTP-based (non-adaptive) progressive download solutions already exists today, which can conveniently be upgraded to support HAS; and (iv) the ability to use standard/existing HTTP servers and caches instead of specialized streaming servers, allowing for reuse of the existing infrastructure and thereby providing better scalability and cost-effectiveness. Accordingly, the broad deployment of HAS technologies will serve as a major enhancement to (non-adaptive) progressive download methods, allowing for enhanced QoE enabled by intelligent adaptation to different link conditions, device capabilities, and content characteristics.

As a relatively new technology in comparison with traditional push-based adaptive streaming techniques, deployment of HAS techniques presents new challenges and opportunities for content developers, service providers, network operators, and device manufacturers. One such important challenge is developing evaluation methodologies and performance metrics to accurately assess user QoE for HAS services, and effectively utilizing these metrics for service provisioning and optimizing network adaptation. In that vein, this chapter provides an overview of HAS concepts and recent Dynamic Adaptive Streaming over HTTP (DASH) standardization, and reviews the recently adopted QoE metrics and reporting framework in Third-Generation Partnership Project (3GPP) standards. Furthermore, we present an end-to-end QoE evaluation study on HAS conducted over 3GPP LTE networks and conclude with a discussion of future directions and challenges in QoE optimization for HAS services.

4.2 HAS Concepts and Standardization Overview

HAS has already been spreading as a form of Internet video delivery, with the recent deployment of proprietary solutions such as Apple HTTP Live Streaming, Microsoft Smooth Streaming, and Adobe HTTP Dynamic Streaming.2 In the meantime, the standardization of HAS has also made great progress, with the recent completion of technical specifications by various standards bodies. In particular, DASH has recently been standardized by Moving Picture Experts Group (MPEG) and 3GPP as a converged format for video streaming [1, 2], and the standard has been adopted by other organizations including Digital Living Network Alliance (DLNA), Open IPTV Forum (OIPF), Digital Entertainment Content Ecosystem (DECE), World-Wide Web Consortium (W3C), and Hybrid Broadcast Broadband TV (HbbTV). DASH today is endorsed by an ecosystem of over 50 member companies at the DASH Industry Forum. Going forward, future deployments of HAS are expected to converge through broad adoption of these standardized solutions.

The scope of both MPEG and 3GPP DASH specifications [1, 2] includes a normative definition of a media presentation or manifest format (for DASH access client), a normative definition of the segment formats (for media engine), a normative definition of the delivery protocol used for the delivery of segments, namely HTTP/1.1, and an informative description of how a DASH client may use the provided information to establish a streaming service. This section will provide a technical overview of the key parts of the DASH-based server–client interfaces, which are part of MPEG and 3GPP DASH standards. More comprehensive tutorials on various MPEG and 3GPP DASH features can be found in [3–5].

The DASH framework between a client and web/media server is depicted in Figure 4.1. The media preparation process generates segments that contain different encoded versions of one or several media components of the media content. The segments are then hosted on one or several media origin servers, along with the Media Presentation Description (MPD) that characterizes the structure and features of the media presentation, and provides sufficient information to a client for adaptive streaming of the content by downloading the media segments from the server over HTTP. The MPD describes the various representations of the media components (e.g., bit rates, resolutions, codecs, etc.) and HTTP URLs of the corresponding media segments, timing relationships across the segments, and how they are mapped into media presentations.

images

Figure 4.1 HAS framework between the client and web/media server

The MPD is an XML-based document containing information on the content, based on a hierarchical data model as depicted in Figure 4.2. Each period consists of one or more adaptation sets. An adaptation set contains interchangeable/alternate encodings of one or more media content components encapsulated in representations (e.g., an adaptation set for video, one for primary audio, one for secondary audio, one for captions, etc.). In other words, representations encapsulate media streams that are considered to be perceptually equivalent. Typically, dynamic switching happens across representations within one adaptation set. Segment alignment permits non-overlapping decoding and presentation of segments from different representations. Stream Access Points (SAPs) indicate presentation times and positions in segments at which random access and switching can occur. DASH also uses a simplified version of XLink in order to allow loading parts of the MPD (e.g., periods) in real time from a remote location. The MPD can be static or dynamic: a dynamic MPD (e.g., for live presentations) also provides segment availability start time and end time, approximate media start time, and the fixed or variable duration of segments. It can change and will be periodically reloaded by the client, while a static MPD is valid for the whole presentation. Static MPDs are a good fit for video-on-demand applications, whereas dynamic MPDs are used for live and Personal Video Recorder (PVR) applications.

images

Figure 4.2 DASH MPD hierarchical data model

A DASH segment constitutes the entity body of the response when issuing a HTTP GET or a partial HTTP GET request, and is the minimal individually addressable unit of data. DASH segment formats are defined for the ISO Base Media File Format (BMFF) and the MPEG2 Transport Stream format. A media segment contains media components and is assigned an MPD URL element and a start time in the media presentation. Segment URLs can be provided in the MPD in the form of exact URLs (segment list) or in the form of templates constructed via temporal or numerical indexing of segments. Dynamic construction of URLs is also possible, by combining parts of the URL (base URLs) that appear at different levels of the hierarchy. Each media segment also contains at least one SAP, which is a random access or switch-to point in the media stream where decoding can start using only data from that point forward. An initialization segment contains initialization information for accessing media segments contained in a representation and does not itself contain media data. Index segments, which may appear either as side files or within the media segments, contain timing and random access information, including media time vs. byte range relationships of sub-segments.

DASH provides the ability to the client to fully control the streaming session (i.e., it can intelligently manage the on-time request and smooth playout of the sequence of segments), potentially adjusting bit rates or other attributes in a seamless manner. The client can automatically choose the initial content rate to match the initial available bandwidth and dynamically switch between different bit-rate representations of the media content as the available bandwidth changes. Hence, DASH allows fast adaptation to changing network and link conditions, user preferences, and device states (e.g., display resolution, CPU, memory resources, etc.). Such dynamic adaptation provides better user QoE, with higher video quality, shorter startup delays, fewer rebuffering events, etc.

At MPEG, DASH was standardized by the Systems Sub-Group, with the activity beginning in 2010, becoming a Draft International Standard in January 2011, and an International Standard in November 2011. The MPEG DASH standard [1] was published as ISO/IEC 23009-1:2012 in April 2012. In addition to the definition of media presentation and segment formats standardized in [1], MPEG has also developed additional specifications [6–8] on aspects of implementation guidelines, conformance and reference software, and segment encryption and authentication. Toward enabling interoperability and conformance, DASH also includes profiles as a set of restrictions on the offered MPD and segments based on the ISO BMFF [9] and MPEG2 Transport Streams [10], as depicted in Figure 4.3. In the meantime, MPEG DASH is codec agnostic and supports both multiplexed and non-multiplexed encoded content. Currently, MPEG is also pursuing several core experiments toward identifying further DASH enhancements, such as signaling of quality information, DASH authentication, server and network-assisted DASH operation, controlling DASH client behavior, and spatial relationship descriptions.

images

Figure 4.3 MPEG DASH profiles

At 3GPP, DASH was standardized by the 3GPP SA4 Working Group, with the activity beginning in April 2009 and Release 9 work with updates to Technical Specification (TS) 26.234 on the Packet Switched Streaming Service (PSS) [11] and TS 26.244 on the 3GPP file format [12] completed in March 2010. During Release 10 development, a new specification TS 26.247 on 3GPP DASH [2] was finalized in June 2011, in which ISO BMFF-based DASH profiles were adopted. In conjunction with a core DASH specification, 3GPP DASH also includes additional system-level aspects, such as codec and Digital Rights Management (DRM) profiles, device capability exchange signaling, and QoE reporting. Since Release 11, 3GPP has been studying further enhancements to DASH and toward this purpose collecting new use cases and requirements, as well as operational and deployment guidelines. Some of the documented use cases in the related Technical Report (TR) 26.938 [13] include: operator control for DASH (e.g., for QoE/QoS handling), advanced support for live services, DASH as a download format for push-based delivery services, enhanced ad insertion support, enhancements for fast startup and advanced trick play modes, improved operation with proxy caches, Multimedia Broadcast and Multicast Service (MBMS)-assisted DASH services with content caching at the User Equipment (UE) [8], handling special content over DASH and enforcing specific client behaviors, and use cases on DASH authentication.

4.3 QoE in 3GPP DASH

The development of QoE evaluation methodologies, performance metrics, and reporting protocols plays a key role in optimizing the delivery of HAS services. In particular, QoE monitoring and feedback are beneficial for detecting and debugging failures, managing streaming performance, enabling intelligent client adaptation (useful for device manufacturer), and allowing for QoE-aware network adaptation and service provisioning (useful for the network operator and content/service provider). Having recognized these benefits, both 3GPP and MPEG bodies have adopted QoE metrics for HAS services as part of their DASH specifications. Moreover, the 3GPP DASH specification also provides mechanisms for triggering QoE measurements at the client device as well as protocols and formats for delivery of QoE reports to the network servers. Here, we shall describe in detail the QoE metrics and reporting framework for 3GPP DASH, while it should be understood that MPEG has also standardized similar QoE metrics in MPEG DASH.

In the 3GPP DASH specification TS 26.247, QoE measurement and reporting capability is defined as an optional feature for client devices. However, if a client supports the QoE reporting feature, the DASH standard also mandates the reporting of all the requested metrics at any given time (i.e., the client should be capable of measuring and reporting all of the QoE metrics specified in the standard). It should also be noted here that 3GPP TS 26.247 also specifies QoE measurement and reporting for HTTP-based progressive download services, where the set of QoE metrics in this case is a subset of those provided for DASH.

Figure 4.4 depicts the QoE monitoring and reporting framework specified in 3GPP TS 26.247, summarizes the list of QoE metrics standardized by 3GPP in the specification TS 26.247, and indicates the list of metrics applicable for DASH/HAS (adaptive streaming) and HTTP-based progressive download (non-adaptive). At a high level, the QoE monitoring and reporting framework is composed of the following phases: (1) server activates/triggers QoE reporting, requests a set of QoE metrics to be reported, and configures the QoE reporting framework; (2) client monitors or measures the requested QoE metrics according to the QoE configuration; (3) client reports the measured parameters to a network server. We now discuss each of these phases in the following sub-sections.

images

Figure 4.4 QoE metrics and reporting framework for 3GPP DASH and progressive download

4.3.1 Activation and Configuration of QoE Reporting

3GPP TS 26.247 specifies two options for the activation or triggering of QoE reporting. The first option is via the QualityMetrics element in the MPD and the second option is via the OMA Device Management (DM) QoE management object. In both cases, the trigger message from the server would include reporting configuration information such as the set of QoE metrics to be reported, the URIs for the server(s) to which the QoE reports should be sent, the format of the QoE reports (e.g., uncompressed or gzip), information on QoE reporting frequency and measurement interval, percentage of sessions for which QoE metrics will be reported, and Access Point Name (APN) to be used for establishing the Packet Data Protocol (PDP) context for sending the QoE reports.

4.3.2 QoE Metrics for DASH

The following QoE metrics have been defined in 3GPP DASH specification TS 26.247, to be measured and reported by the client upon activation by the server. It should be noted that these metrics are specific to HAS and content streaming over the HTTP/TCP/IP stack, and therefore differ considerably from QoE metrics for traditional push-based streaming protocols.

  • HTTP request/response transactions. This metric essentially logs the outcome of each HTTP request and corresponding HTTP response. For every HTTP request/response transaction, the client measures and reports (i) the type of request (e.g., MPD, initialization segment, media segment, etc.), (ii) times for when the HTTP request was made and the corresponding HTTP response was received (in wall clock time), (iii) the HTTP response code, (iv) contents in the byte-range-spec part of the HTTP range header, (v) the TCP connection identifier, and (vi) throughput trace values for successful requests. From the HTTP request/response transactions, it is also possible to derive more specific performance metrics such as the fetch durations of the MPD, initialization segment, and media segments.
  • Representation switch events. This metric is used to report a list of representation switch events that took place during the measurement interval. A representation switch event signals the client's decision to perform a representation switch from the currently presented representation to a new representation that is later presented. As part of each representation switch event, the client reports the identifier for the new representation, the time of the switch event (in wall clock time) when the client sent the first HTTP request for the new representation, and the media time of the earliest media sample played out from the new representation.
  • Average throughput. This metric indicates the average throughput that is observed by the client during the measurement interval. As part of the average throughput metric, the client measures and reports (i) the total number of content bytes (i.e., the total number of bytes in the body of the HTTP responses) received during the measurement interval, (ii) the activity time during the measurement interval, defined as the time during which at least one GET request is still not completed, (iii) the wall clock time and duration of the measurement interval, (iv) the access bearer for the TCP connection for which the average throughput is reported, and (v) the type of inactivity (e.g., pause in presentation, etc.).
  • Initial playout delay. This metric signals the initial playout delay at the start of the streaming of the presentation. It is measured as the time from when the client requests the fetch of the first media segment (or sub-segment) to the time at which media is retrieved from the client buffer.
  • Buffer level. This metric provides a list of buffer occupancy-level measurements carried out during playout. As part of the buffer-level metric, the client measures and reports the buffer level that indicates the playout duration for which media data is available, starting from the current playout time along with the time of the measurement of the buffer level.
  • Play list. This metric is used to log a list of playback periods in the measurement interval, where each playback period is the time interval between a user action and whichever occurs soonest of the next user action, the end of playback, or a failure that stops playback. The type of user actions that trigger playout may include a new playout request, resume playout from pause, or user-requested quality change. For each playback period, the client measures and reports the identifiers of the representations that were rendered and their rendering times (in media time) and durations, playback speed relative to normal playback speed (e.g., to track trick modes such as fast forward or rewind), and reasons for why continuous playback of this representation was interrupted (e.g., due to representation switch events, rebuffering, user request, or end of period, media content or a metrics collection period).
  • MPD information. This metric allows for reporting information on the media presentations from the MPD so that servers without direct access to the MPD can learn the media characteristics. Media representation attributes on bit rate, resolution, quality ranking, and codec-related media information – including profile and level – can be reported by the client via this metric.

4.3.3 QoE Reporting Protocol

In 3GPP DASH, QoE reports are formatted as an eXtensible Markup Language (XML)3 document complying with the XML schema provided in specification TS 26.247. The client uses HTTP POST request signaling (RFC 2616) carrying XML-formatted metadata in its body to send the QoE report to the server.

4.4 Link-Aware Adaptive Streaming

The central intelligence in HAS resides in the client rather than the server. The requested representation levels of video chunks (forming the HAS segments) are determined by the client and communicated to the server. Based on the frame levels, the operation of the client in a link-aware adaptive streaming framework can be characterized into four modes or states: (i) startup mode, (ii) transient state, (iii) steady state, and (iv) rebuffering state (see Figure 4.5).

images

Figure 4.5 Adaptive streaming client player states

Startup mode is the initial buffering mode, during which the client buffers video frames to a certain limit before beginning to play back the video (i.e., the client is in the startup mode as long as AiAStartUpthresh, where Ai represents the total number of video frames received until frame slot i. Steady state represents the state in which the UE buffer level is above a certain threshold (i.e., BiBSteadythresh), where Bi tracks the number of frames in the client buffer that are available for playback in frame slot i. The transient state is the state in which the UE buffer level falls below a certain limit after beginning to play back (i.e., Bi < BSteadythresh). The rebuffering state is the state that the client enters when the buffer level becomes zero after beginning to play back. Once it enters the rebuffering state, it remains in that state until it rebuilds its buffer level to a satisfactory level to begin playback (i.e., until BiBRebuffthresh).

One of the key aspects of adaptive streaming is the estimate of available link bandwidth. A typical throughput estimate is the average segment or HAS throughput, which is defined as the average ratio of segment size to download time of HAS segments:

where Sj(s), Tfetchj(s), and Tdwldj(s) are the size, fetch time, and download time of the sth video segment of client j, Sij the number of segments downloaded by client j until frame slot i, and F the number of video segments over which the average is computed. Based on this estimate, the best video representation level possible for the next video segment request is determined as follows:

The key QoE metrics of interest are: (i) startup delay, (ii) startup video quality, (iii) overall average video quality, and (iv) rebuffering percentage. Startup delay refers to the amount of time it takes to download the initial frames necessary to begin playback. Average video quality is the average video quality experienced by a user. Startup video quality refers to the average video quality in the startup phase. Rebuffering percentage is the percentage of time the client spends in the rebuffering state. It has been observed that rebuffering is the most annoying to video-streaming users, and hence it is important to keep the rebuffering percentage low by judicious rate adaptation.

Typical HAS algorithms use either application or transport-layer throughputs (as in Eq. (4.1)) for video rate adaptation [14]. We refer to these approaches as PHY Link Unaware (PLU). However, using higher layer throughputs alone can potentially have adverse effects on user QoE when the estimated value is different from what is provided by the wireless link conditions – a lower estimate results in lower quality and a higher estimate can result in rebuffering. These situations typically occur in wireless links due to changes in environmental and/or loading conditions. In [15], a Physical Link-Aware (PLA) approach to adaptive streaming was proposed to improve video QoE in changing wireless conditions. Physical-layer (PHY) goodput, used as a complement to higher layer-throughput estimates, allows us to track radio-link variations at a finer time scale. This opens up the possibility for opportunistic link-aware video-rate adaptation that is to improve the QoE of the user. Average PHY-layer goodput at time t is defined as the ratio of the number of bits received during the time period (tT, t) to the averaging duration T as follows:

Using PHY goodput for HAS requires collaboration between the application and the physical layers, but it can provide ways to improve various QoE metrics for streaming over wireless using even simple enhancements. Here we describe two simple enhancements for the startup and steady states.

Typical HAS startup algorithms request one video segment every frame slot at the lowest representation level to build the playback buffer quickly. This compromises the playback video quality during the startup phase. Link-aware startup can be used to optimize video quality based on wireless link conditions right from the beginning of the streaming session. An incremental quality approach could be used so that startup delay does not increase beyond satisfactory limits due to quality optimization. The next available video adaptation rate is chosen if enough bandwidth is available to support such a rate. For this purpose, the ratio δi is defined as follows:

This ratio represents the ratio of the average PHY goodput to the next video representation level that is possible. Q0 is initialized based on historical PHY goodput information before the start of the streaming session:

The representation level for the segment request in frame slot i is then selected as follows:

The next representation level is chosen only when δi is greater than (1 + α). α > 0 is a parameter that can be chosen depending on how aggressively or conservatively we would like to optimize quality during the startup phase. The condition δi ⩾ (1 + α) ensures that the rate adaptation does not fluctuate with small-scale fluctuations of wireless link conditions.

For the following evaluation results, we use Peak Signal-to-Noise Ratio (PSNR) for video quality, although our approach is not restricted to this and other metrics such as Structural Similarity (SSIM) could also be used.

Figure 4.6 shows a comparison of the Cumulative Distribution Functions (CDF) of startup delay and average video quality during the startup phase for PLA and PLU approaches. For the 75-user scenario, PHY link awareness can improve the average startup quality by 2 to 3 dB for more than 90% of users, at the cost of only a slight (tolerable) increase in startup delay. In the 150-user scenario, we see a slightly lower 1 to 2 dB improvement in average video quality for more than 50% of users, with less than 0.25 s degradation in startup delay. These results demonstrate that PLA can enhance the QoE during the startup phase by improving video quality with an almost unnoticeable increase in startup delay.

images

Figure 4.6 Startup delay and startup quality comparison for PLA and PLU approaches

In the steady-state mode, the buffer level at the client is above a certain level. In traditional steady-state algorithms, the objective is to maintain the buffer level without compromising video quality. This is typically done by periodically requesting one segment worth of frames for each segment duration in the steady state. However, this might result in rebuffering in wireless links that fluctuate. PHY goodput responds quickly to wireless link variations, while segment throughput responds more slowly to link variations. So, PHY goodput could be used as a complement to fragment throughput to aid in rate adaptation. When link conditions are good, Rphyt > Riseg and when link conditions are bad, Rphyt < Riseg. A conservative estimate of maximum throughput is determined based on PHY goodput and segment throughput, which could help avoid rebuffering in the steady state. Such a conservative estimate may be achieved as follows:

This approach ensures that (i) when the link conditions are bad and segment throughput is unable to follow the variation in link conditions, we use PHY goodput to lower the estimate of the link bandwidth that is used for video rate adaptation and (ii) when the link conditions are good in steady state, we get as good video quality as using the PLU approach. The constant β in Eq. (4.7) prevents short-term variations in link conditions from changing the rate adaptation. The best video representation level possible in frame slot i, Qsupi, is determined conservatively based on Rconi:

Figure 4.7 compares the CDFs of rebuffering percentage and average playback video quality performance using PLA and PLU approaches for 100 and 150 users. In the 100-user scenario, the number of users not experiencing rebuffering improves from around 75% to 92% (a 17% improvement) and the peak rebuffering percentage experienced by any user reduces from around 30% to 13% using the PLA approach. This improvement in rebuffering performance is at the cost of only a slight degradation in video quality (0.6 dB average) compared with the PLU approach for some users. In the highly loaded 150-user scenario, we observe that using the PLA approach we can obtain around a 20% improvement in number of users not experiencing rebuffering (from around 56% to 76%) at the cost of minimal degradation in average video quality by less than 0.5 dB on average for 50% of users. Thus, PLA can enhance the user QoE during video playback by reducing the rebuffering percentage significantly at the cost of a very minor reduction in video quality.

images

Figure 4.7 Rebuffering and average quality comparison for PLA and PLU approaches

4.5 Video-Aware Radio Resource Allocation

Wireless links are fluctuating by nature. In most cellular wireless networks, the UEs send to the Base Station (BS) periodic feedback regarding the quality of wireless link that they are experiencing in the form of Channel Quality Information (CQI). The CQI sent by the UEs is discretized, thus making the overall channel state m discrete. The BS translates the CQI into a peak rate vector μm = (μm1, μ2m, ..., μmJ), with μmj representing the peak achievable rate by user j in channel state m. For every scheduling resource, the BS has to make a decision as to which user to schedule in that resource. Scheduling the best user would always result in maximum cell throughput but may result in poor fairness. Scheduling resources in a round-robin fashion might result in an inability to take advantage of the wireless link quality information that is available. So, typical resource allocation algorithms in wireless networks seek to optimize the average service rates R = (R1, R2, R3, …, RJ) to users such that a concave utility function H(R) is maximized subject to the capacity (resource) limits in the wireless scenario under consideration, i.e.

where V represents the capacity region of the system. Utility functions of the sum form have attracted the most interest:

where each Hj(Rj) is a strictly concave, continuously differentiable function defined for Rj > 0. The Proportional Fair (PF) and Maximum Throughput (MT) scheduling algorithms are special cases of objective functions of the form, with Hj(Rj) = log (Rj) and H(Rj) = Rj, respectively.

The key objective of a video-aware optimization framework for multi-user resource allocation is to reduce the possibility of rebuffering without interfering with the rate-adaptation decisions taken by the HAS client. To this end, a buffer-level feedback-based scheduling algorithm in the context of HAS was proposed in [10] by modifying the utility function of the PF algorithm to give priority to users with buffer levels lower than a threshold. However, this emergency-type response penalizes other users into rebuffering, especially at high loading conditions, thus decreasing the effectiveness of the algorithm. To overcome this limitation, a video-aware optimization framework that constrains rebuffering was proposed in [16]. In order to avoid rebuffering at a video client, video segments need to be downloaded at a rate that is faster than the playback rate of the video segments. Let Tj(s) be the duration of time taken by user j to download a video segment s and τj(s) be the media duration of the segment. Then, to avoid rebuffering, the following constraint is introduced:

where δ > 0 is a small design parameter to account for variability in wireless network conditions. Segment download time Tj(s) depends on the size of the video segment Sj(s) and the data rates experienced by user j. Sj(s) in turn depends on the video content and representation (adaptation) level that is chosen by the HAS client. The HAS client chooses the representation level for each video segment based on its state and its estimate of the available link bandwidth. Based on all this, we propose a Rebuffering Constrained Resource Allocation (RCRA) framework as follows:

The additional constraints related to rebuffering closely relate the buffer evolution at HAS clients to resource allocation at the base station. Intelligent resource allocation at the BS can help reduce rebuffering in video clients.

Enforcing the rebuffering constraints in Eq. (4.12) in a practical manner requires feedback from HAS clients. Each adaptive streaming user can feed back its media playback buffer level periodically to the BS scheduler in addition to the normal CQI feedback. The buffer-level feedback can be done directly over the RAN or more practically, indirectly through the video server.

Scheduling algorithms for multi-user wireless networks need to make decisions during every scheduling time slot (resource) t in such a way as to lead to a long-term optimal solution. The scheduling time slot for modern wireless networks is typically at much finer granularity than a (video) frame slot. A variant of the gradient scheduling algorithm called the Rebuffering-Aware Gradient Algorithm (RAGA) in [16] can be used to solve the optimization problem in Eq. (4.12) by using a token-based mechanism to enforce the rebuffering constraints. The RAGA scheduling decision in scheduling time slot t when the channel state is m(t) can be summarized as follows:

where Rj(t) is the current moving-average service rate estimate for user j. It is updated every scheduling time slot as in the PF scheduling algorithm, i.e.

where β > 0 is a small parameter that determines the time scale of averaging and μj(t) is the service rate of user j in time slot t. μj(t) = μm(t)j if user j was scheduled in time slot t and μj(t) = 0 otherwise. Wj(t) in Eq. (4.13) is a video-aware user token parameter and aj(t) is a video-aware user time-scale parameter, both of which are updated based on periodic media buffer-level feedback. These parameters hold the key to enforcing rebuffering constraints at the BS. Such a feedback mechanism has been defined in the DASH standard [1, 2] and is independent of specific client player implementation. For simplicity, we assume that such client media buffer-level feedback is available only at the granularity of a frame slot. Therefore, the user-token parameter and user-time-scale parameter are constant within a frame slot, i.e.

Let Bij represent the buffer status feedback in frame slot i in units of media time duration. The difference between buffer levels from frame slot (i − 1) to frame slot i is given by

numbered Display Equation

A positive value for Bi, diffj indicates an effective increase in media buffer size in the previous reporting duration and a negative value indicates a decrease in media buffer size. Note that this difference depends on frame playback and download processes at the HAS client. To avoid rebuffering, we would like the rate of change at the client media buffer level to be greater than a certain positive threshold, i.e.

The media-buffer-aware user-token parameter is updated every frame slot as follows:

The intuitive interpretation of Eq. (4.17) is that if the rate of media buffer change for a certain user is below the threshold, the token parameter is incremented by an amount (δτ − Bdiffj)  that reflects the relative penalty for having a buffer change rate below the threshold. This increases its relative scheduling priority compared with other users whose media buffer change rate is higher. Similarly, when the rate of buffer change is above the threshold, the user-token parameter is decreased to offset any previous increase in scheduling priority. Wij is not reduced below zero, reflecting the fact that all users with a consistent buffer rate change greater than the threshold have scheduling priorities as per the standard proportional fair scheduler.

The video-aware parameter aij determines the time scale over which rebuffering constraints are enforced for adaptive streaming users. A larger value of aij implies greater urgency in enforcing the rebuffering constraints for user j. In a HAS scenario, the values of aij can be set to reflect this relative urgency for different users. Therefore, we set aij based on the media buffer level of user j in frame slot i as follows:

where φ is a scaling constant, Bij is the current buffer level in seconds for user j, and BSteadythresh is the threshold for the steady-state operation of the HAS video client. If the buffer level Bij for user j is above the threshold, then aij = 1 and if it is below the threshold, then aij scales to give relatively higher priorities to users with lower buffer levels. This scaling of priorities based on absolute user buffer levels improves the convergence of the algorithm. The user-time-scale parameter aij is set to 0 for non-adaptive streaming users, turning the metric in Eq. (4.13) into a standard PF metric. Note that the parameter Wj(t) is updated based on the rate of media-buffer-level change, while the parameter aj(t) is updated based on the buffer levels themselves. Such an approach provides a continuous adaptation of user scheduling priorities based on media-buffer-level feedback (unlike an emergency-response-type response) and reduces the rebuffering percentage of users without significantly impacting video quality.

Figure 4.8 compares the rebuffering percentage and the Perceived Video Quality (PVQ) of resource allocation algorithm RAGA with standard Proportional Fair (PF), Proportional Fair with Barrier for Frames (PFBF), and GMR (Gradient with Minimum Rate) algorithms in a 100-user scenario. For GMR, we set the minimum rate for each video user to the rate of the lowest representation level of the user's video. PVQ is computed as the difference between the mean and standard deviation of PSNR. Only played-out video frames are considered in the computation of PVQ. Observe that RAGA has the lowest rebuffering percentage among all the schemes across all the users. It has reduced the number of users experiencing rebuffering and also the amount of rebuffering experienced by the users. The PVQ using RAGA is better than PF scheduling for all users. GMR is better than PF in terms of rebuffering, but it still lags behind RAGA in rebuffering performance due to a lack of dynamic cooperation with the video clients. Although GMR appears to have marginally better PVQ than RAGA, this is at a huge cost in terms of increased rebuffering percentages. PFBF performs better than GMR in terms of peak rebuffering percentage but lags behind both PF and GMR in terms of the number of users experiencing rebuffering. Also, PFBF has better PVQ than all schemes for some users and worse than all schemes for others. The disadvantage with PFBF is that it reacts to low buffer levels in an emergency fashion and inadvertently penalizes good users to satisfy users with low buffer levels. RAGA continually adjusts the scheduling priorities of the users based on the rate of change of media buffer levels, thus improving the QoE of streaming users in terms of reduced rebuffering and balanced PVQ.

images

Figure 4.8 Rebuffering and average quality comparison for RAGA with different scheduling approaches

4.6 DASH over e-MBMS

As the multicast standard for Long-Term Evolution (LTE), enhanced Multimedia Broadcast Multicast Service (e-MBMS) was introduced by 3GPP to facilitate delivery of popular content to multiple users over a cellular network in a scalable fashion. Delivery of popular YouTube clips, live sports events, news updates, advertisements, file sharing, etc. are relevant use cases for eMBMS. eMBMS utilizes the network bandwidth more efficiently than unicast delivery by using the inherent broadcast nature of wireless channels. For unicast transmissions, retransmissions based on Automatic Repeat Request (ARQ) and/or Hybrid ARQ (HARQ) are used to ensure reliability. However, for a broadcast transmission, implementing ARQ can lead to network congestion with multiple users requesting different packets. Moreover, different users might lose different packets and retransmission could mean sending a large chunk of the original content again, leading to inefficient use of bandwidth as well as increased latency for some users. Application Layer Forward Error Correction (AL-FEC) is an error-correction mechanism in which redundant data is sent to facilitate recovery of lost packets. For this purpose, Raptor codes [17, 18] were adopted in 3GPP TS 26.346 [19] as the AL-FEC scheme for MBMS delivery. Recently, improvements in the Raptor codes have been developed and an enhanced code called RaptorQ has been specified in RFC 6330 [20] and proposed to 3GPP. Streaming delivery (based on the H.264/AVC video codec and Real-time Transport Protocol (RTP)) over MBMS was studied in [21].

The forthcoming discussion presents the existing standardized framework in TS 26.346 [19] for live streaming of DASH-formatted content over eMBMS. The eMBMS-based live video streaming is over the FLUTE protocol [22] – file delivery over unidirectional transport – which allows for transmission of files via unidirectional eMBMS bearers. Each video session is delivered as a FLUTE transport object, as depicted in Figure 4.9. Transport objects are created as soon as packets come in. The IPv4/UDP/FLUTE header is a total of 44 bytes per IP packet. Protection against potential packet errors can be enabled through the use of AL-FEC. The AL-FEC framework decomposes each file into a number of source blocks of approximately equal size. Each source block is then broken into K source symbols of fixed symbol size T bytes. The Raptor/RaptorQ codes are used to form N encoding symbols from the original K source symbols, where N > K. Both Raptor and RaptorQ are systematic codes, which means that the original source symbols are transmitted unchanged as the first K encoding symbols. The encoding symbols are then used to form IP packets and sent. At the decoder, it is possible to recover the whole source block from any set of encoding symbols only slightly greater than K with a very high probability. Detailed comparisons between Raptor and RaptorQ are presented in [23]. The choice of the AL-FEC parameters is made at the Broadcast Multicast Service Center (BMSC). For example, the BMSC has to select the number of source symbols K, the code rate K/N, and the source symbol size T. For a detailed discussion on the pros and cons of choosing these different parameters, the reader is referred to [24].

images

Figure 4.9 Transport-layer processing overview

When simulating a live service, a long waiting time for encoding is not desirable. However, to ensure good Raptor/RaptorQ performance, a large value of K needs to be chosen. Thus, the minimum value of K = Kmin is an important design parameter. A larger Kmin causes a large startup delay, whereas a smaller Kmin leads to poor performance. N encoding symbols are generated from K symbols using the AL-FEC (Raptor/RaptorQ) scheme. IP packets are then formed using these encoding symbols as payloads. The FLUTE packet is generated from the FLUTE header and payload containing the encoding symbols.

IP packets (RLC-SDUs (Service Data Units)) are mapped into fixed-length RLC-PDUs (Protocol Data Units). A 3GPP RAN1-endorsed two-state Markov model can be used to simulate LTE RLC-PDU losses, as shown in Figure 4.10. A state is good if it has less than 10% packet loss probability for the 1% and 5% BLER simulations, or less than 40% packet loss probability for the 10% and 20% BLER simulations.

images

Figure 4.10 Markov model for simulating LTE RLC-PDU losses

The parameters in the figure are as follows: p is the transition probability from a good state to a bad state; q is the transition probability from a bad state to a good state; pg is the BLER in a good state; pb is the BLER in a bad state. It can be seen that the RAN model described above does not capture the coverage aspect of a cell, since it is the same for all users. For a more comprehensive end-to-end analysis, the following model can be used.

Instead of using a Markov model for all the users as above, a separate Markov model for each user in a cell can be used [24]. The received SINR data for each user is then used to generate a Multicast Broadcast Single-Frequency Network (MBSFN) sub-frame loss pattern. Such data can be collected for different MCS (Modulation and Coding Scheme) values. Using the sub-frame loss pattern for a given MCS, separate Markov models can be generated for each user in a cell. Note that this model is not fundamentally different from the RAN-endorsed model, but it accounts for the varying BLER distribution across users in a cellular environment. The BLER distribution depends on the specific deployment models and assumptions and could be different subject to different coverage statistics.

The performance bounds for eMBMS can be evaluated under different conditions. The bearer bit rate is assumed to be 1.0656 Mbits/s. Publicly available video traces can be used for video traffic modeling (http://trace.eas.asu.edu). Video traces are files mainly containing video-frame time stamps, frame types (e.g., I, P, or B), encoded frame sizes (in bits), and frame qualities (e.g., PSNR) in a Group of Pictures (GoP) structure. The length of an RLC-SDU is taken as 10 ms. The content length is set at 17,000 frames for each video trace. The video-frame frequency is considered to be 30 frames/s. The video frames are then used to generate source blocks and encoding symbols are generated using the AL-FEC framework (both Raptor/RaptorQ). The system-level simulations offer beneficial insights on the effect of system level and AL-FEC parameters on the overall QoE.

Different QoE metrics can be considered for multimedia delivery to mobile devices. In the case of file download or streaming of stored content, on user request, there is an initial startup delay after which streaming of video occurs and QoE can be measured by the initial startup delay and fraction of time that rebuffering occurs. The main contribution to startup delay for eMBMS live streaming is the AL-FEC encoding delay (i.e., when the service provider has to wait for a sufficient number of frames to be generated to ensure a large enough source block for efficient AL-FEC implementation). The source symbol size is chosen as T = 16 bytes. It is kept small in order to decrease the initial startup delay, so that a larger value of K can be chosen for the same source block.

The average startup delay (averaged over different code rates K/N = 0, 6, 0.7, 0.8, 0.9) is plotted in Figure 4.11 as a function of Kmin. As expected, the startup delay increases with increasing Kmin. The average PSNR of the received video stream is calculated using the offset trace file used for simulations. When a frame is lost, the client tries to conceal the lost frame by repeating the last successfully received frame. The rebuffering percentage is defined as the fraction of time that video playback is stalled in the mobile device. For live streaming, rebuffering occurs whenever two or more consecutive frames are lost. The client repeats the last successfully received frame and the video appears as stalled to the user. Video playback resumes as soon as one of the future frames is received successfully. The empirical Cumulative Density Function (CDF) of the PSNR and the rebuffering percentage for code rates 0.9 and 0.8 are shown in Figures 4.12 and 4.13, respectively. Kmin is fixed to be 64. For detailed simulation parameters and algorithms, refer to [24]. It can be observed that improving the code rate improves the coverage from a QoE perspective, as it guarantees better PSNR and rebuffering for more users.

images

Figure 4.11 Startup delay as a function of Kmin

images

Figure 4.12 Performance comparisons for K/N = 0.8, 0.9: average PSNR

images

Figure 4.13 Performance comparisons for K/N = 0.8, 0.9: rebuffering percentage

4.7 Server–Client Signaling Interface Enhancements for DASH

One of the most common problems associated with video streaming is the clients' unawareness of server and network conditions. Clients usually issue requests based on their bandwidth, unaware of the server's status which comprises factors such as

  • the server's maximum upload rate and
  • the number of clients streaming from the same server at the same time.

Thus, clients tend to request segments belonging to representations at the highest possible bit rates based on their perception, regardless of the server's condition. This kind of behavior often causes clients to compete for the available bandwidth and overload the server. As a result, clients could encounter playback stalls and pauses, which deteriorate QoE. Figure 4.14 shows a typical example of multiple clients streaming simultaneously. Initially, with only one streaming client, the available bandwidth for content streaming is high and the client gets the best possible quality based on his/her bandwidth. With more clients joining in the streaming process, clients starts to compete for the bandwidth and consequently the QoE drops. Greedy clients tend to eat up network bandwidth and stream at higher quality, leaving the rest of the clients to suffer much lower QoE.

images

Figure 4.14 Download rate (throughput) for four clients streaming content from the same server and network

Existing load-balancing algorithms blindly distribute bandwidth equally among streaming clients. However, an equal bandwidth-sharing strategy might not always be the best solution, since it may not provide the same QoE. For example, fast or complex-motion content, as in soccer or action movies, typically requires more bandwidth in order to achieve equal quality to low-motion content, such as a newscast.

In our proposed solution, both the clients and the server share additional information through a feedback mechanism. Such information includes

  • the average QoE measured at the client side and
  • the server's upload rate.

The clients notify the server of their perceived QoE so far. This is in the form of statistics sent by the client regarding the client's average requested bit rate, average Mean Opinion Score (MOS), number of buffering events, etc. Other quality metrics can also be used. The server uses the client information in order to perform quality-based load balancing.

The server in return advises each client about the bandwidth limit it can request. In other words, the server can notify each client which DASH representations can be requested at any given time. This is achieved by sending the clients a special binary code, the Available Representation Code (ARC). ARC includes a bit for each representation, the Representation Access Bit (RAB), which can be either 0 or 1. The rightmost bit in the ARC corresponds to the representation with the highest bit rate, while the leftmost bit corresponds to the representation with the least bit rate.

As fluctuations to the server's upload rate occur, the server starts limiting the representations available to the clients. The server deactivates the representations available to clients in such a manner that at any point in time the maximum total bit rate requested by all clients does not exceed the server's upload rate. By defining such limits, the server remains at less risk of being overloaded and hence there are fewer delays in content transfer, leading to higher QoE at the streaming clients' side. The selection of which representation to be enabled or disabled is subject to server-based algorithms.

Different outcomes regarding the collective QoE of the streaming clients can be achieved depending on the algorithm selected for representation (de-)activation. In scenarios where the server gets overloaded with requests, limiting the representations available to clients can be useful in different ways. In the following sub-sections, two load-balancing approaches based on our server-assisted feedback will be described. These approaches are:

  1. Minimum QoE reduction. In this approach, the target of server load balancing is to minimize significant drops in quality for each user.
  2. Same average QoE. In this approach, the target is to have all clients perceive the same average QoE over time.

4.7.1 The Minimum Quality Reduction Approach

This algorithm's main focus is to minimize significant drops in quality on a per-user basis. In other words, higher priority is given to users who will experience a bigger gap in quality in case a representation is to be deactivated. This approach uses iterative steps to select the representation to be disabled or enabled when the server's upload rate changes or clients' requests exceed the maximum server upload rate. The procedure can be summarized as follows:

  1. For each client, compute the quality reduction when the maximum available representation is disabled.
  2. Disable the representation that causes minimum quality reduction.
  3. Compute the sum of the bit rates of the maximum representation available at each client.
  4. If the sum computed in step 3 is still higher than the available maximum upload rate, repeat steps 1 to 4; otherwise, go to step 5.
  5. Stop.

As an example, Table 4.1 lists the typical quality changes experienced by two clients streaming different contents at different bit rates. Table 4.2 lists the corresponding outcome per iteration when the maximum upload rate changes from 2000 kbits/s to 1200 kbits/s.

Table 4.1 Average PSNR per representation (bit rate) for two different contents

Representation Client 1 Client 2
I Ri (kbits/s) PSNR ΔPSNRi (RmaxRi) PSNR ΔPSNRi (RmaxRi)
0 1000 46.13 37.97
1 800 43.70 2.43 37.49 0.48
2 600 40.57 5.56 36.80 1.17
3 400 36.58 9.55 34.49 3.48
4 300 32.14 13.99 32.55 5.42

Table 4.2 Tracing table for Algorithm 1 in case available bandwidth drops from 2000 kbits/s to 1200 kbits/s


Iteration number
Client 1 Client 2
Total bandwidth (kbps)
Ri/bit rate ΔPSNRi ARC Ri/bit rate ΔPSNRi ARC
1 R0/1000 11111 R0/1000 0 11111 2000
2 R1/800 2.43 11111 R1/800 0.48 11110 1800
3 R1/800 2.43 11111 R2/600 1.17 11100 1600
4 R1/800 2.43 11110 R3/400 3.48 11100 1400
5 R2/600 5.56 11110 R3/400 3.48 11000 1200

In Table 4.1:

  • Rmax is the quality of the highest bit rate representation;
  • PSNR is the corresponding average quality of each representation;
  • ΔPSNRi is the difference between the quality of the highest representation and representation i.

The action performed at each iteration can be explained as follows:

  1. At this stage, the total bandwidth requested by the clients (2000 kbits/s) exceeds that permissible by the server (1200 kbits/s) and thus a few representations need to be disabled.
  2. The drop in quality experienced by disabling Client 2/R0 is less than the drop in quality experienced by disabling Client 1/R0 and thus Client 2/R0 is deactivated by setting its RAB to 0.
  3. The new total bandwidth (1800 kbits/s) is still higher than the maximum available, so Client 2/R1 will be disabled as this still leads to a smaller drop in quality.
  4. The new total bandwidth (1600 kbits/s) is still higher than the maximum available. At this point, the quality drop experienced by Client 1 upon deactivation of Client 1/R1 is less than that experienced by Client 2 upon deactivating Client 2/R2, thus Client 1/R1 is disabled.
  5. Given the new total bandwidth (1400 kbits/s), Client 2/R2 is disabled and since the target bandwidth has now been reached, the algorithm stops.

Since the server is no longer overloaded, clients are at less risk of buffering stalls. In our approach, we used PSNR as a balancing criterion. PSNR values are pre-calculated for each DASH segment and stored in the DASH MPD. Other criteria, such as the MOS, can also be used with the same approach.

4.7.2 Same Average Quality Approach

The minimum quality reduction approach mainly exploits feedback sent from the server to clients but not the other way around. The approach discussed in this section exploits two-way feedback.

In this algorithm, the main focus is to set all clients to approximately the same average quality. The iterative procedure is described as follows:

  1. Compute the sum of the bit rates of the maximum representation available at each client.
  2. If the sum calculated in step 1 exceeds that supported by the server, go to step 3 (overflow phase). Else, if the sum is found to be less than the server's upload rate, go to step 4 (underflow phase).
  3. Overflow phase:
    1. Find the client with the highest average quality (the average quality value is computed by the client and sent to the server via a feedback mechanism). If the client has more than one representation activated go to step 3(b), else go to step 3(c).
      1. Set client update flag to true.
      2. Deactivate highest representation.
      3. Replace the client's average quality with the quality corresponding to the average quality of the maximum representation permissible to that specific client.
      4. If the sum of the bit rates of the maximum representation available at each client still exceeds the server's upload rate go to step 3(a), else stop.
    2. Remove the client found in step 3(a) from the list of candidates. If the list of candidates is empty and the client update flag is false go to step 5, else if the list of candidates is empty but the client update flag is true then stop, else go to step 3(a).
  4. Underflow phase:
    1. Find the client with the lowest average quality. If the client has at least one deactivated representation and activating that representation will not cause an overflow, then
      1. Set client update flag to true.
      2. Activate the representation.
      3. Set the client's average quality to the average quality of the maximum representation permissible to him.
      4. Go to step 4(c).

        Else, remove the client from the list of candidates and go to step 4(b).

    2. If the list of candidates is empty and the client update flag is false then go to step 5, else if the list of candidates is empty but the client update flag is true then stop, else go to step 4(c).
    3. Recompute the sum of the bit rates of the maximum representation available at each client. If an underflow still occurs go to step 4(a), else stop.
  5. Stable phase:
    1. Set MaxClient to the client with the highest average quality (MaxQ).
    2. Set MinClient to the client with the least average quality (MinQ).
    3. If the following conditions are satisfied go to step 5(d), else stop:
      • – The difference between MaxQ and MinQ exceeds a specified threshold.
      • – MaxClient has more than one activated representation.
      • – MinClient has at least one deactivated representation.
      • – The quality of the maximum representation available to MaxClient exceeds that available to MinClient.
    4. Deactivate a representation from MaxClient. If the bandwidth saved as a result of the deactivation suffices to enable a representation for MinClient, a representation is activated. Go to step 4(c).

      Using the same values as in Table 4.1, an illustrative example is shown in Table 4.3 where the server's upload rate is also set to 1200 kbits/s.

Table 4.3 Tracing results for Algorithm 2 in case the bandwidth allowed drops from 2000 kbits/s to 1200 kbits/s


Iteration number
Client 1 Client 2
Total bandwidth (kbps)
Ri/bit rate Max PSNRi ARC Ri/bit rate Max PSNRi ARC
1 R0/1000 45 11111 R0/1000 37 11111 2000
2 R1/800 43.70 11110 R0/1000 37.97 11111 1800
3 R2/600 40.57 11100 R0/1000 37.97 11111 1600
4 R3/400 36.58 11000 R0/1000 37.97 11111 1400
5 R3/400 36.58 11000 R1/800 37.49 11110 1200

The details of each iteration step can be explained as follows:

  1. The PSNR values stated at the first iteration correspond to the average PSNR calculated by the clients and sent to the server. At this stage the total bandwidth that can be requested by the clients exceeds that permissible by the server, and thus we need to disable a few representations.
  2. Client 1 has higher average quality (PSNR = 45), thus a representation has been deactivated at his side and the average quality has been replaced by that of the highest representation permissible so far (PSNR = 43.70).

The algorithm continues until the sum of the highest bit rates permissible for each client does not exceed the server's upload rate. Using such an algorithm ensures that all clients stream at almost the same average quality.

Experimental results have verified that the use of server-assisted feedback approaches result in:

  • Significant reduction in playback stalls at the client side as well as lower buffering time.
  • Better QoE balancing between multiple clients.
  • Better playback time. This is a side-effect of the reduction in buffering time, since clients usually continue to wait (or repeat the requests) until the required segment is retrieved.

On the contrary, there was little or no perceivable quality loss since clients – when aware of the server load condition – tended to request low-quality segments to avoid buffering events or long stalls.

4.8 Conclusion

We have given an overview of the latest DASH standardization activities at MPEG and 3GPP and reviewed a number of research vectors that we are pursuing with regard to optimizing DASH delivery over wireless networks. We believe that this is an area with a rich set of research opportunities and that further work could be conducted in the following domains.

  1. Development of evaluation methodologies and performance metrics to accurately assess user QoE for DASH services (e.g., those adopted as part of MPEG and 3GPP DASH specifications [1, 2]), and utilization of these metrics for service provisioning and optimizing network adaptation.
  2. DASH-specific QoS delivery and service adaptation at the network level, which involves developing new Policy and Charging Control (PCC) guidelines, QoS mapping rules, and resource management techniques over radio access network and core IP network architectures.
  3. QoE/QoS-based adaptation schemes for DASH at the client, network, and server (potentially assisted by QoE feedback reporting from clients), to jointly determine the best video, transport, network, and radio configurations toward realizing the highest possible service capacity and end-user QoE. The broad range of QoE-aware DASH optimization problems emerging from this kind of cross-layer cooperation framework includes investigation topics such as QoE-aware radio resource management and scheduling, QoE-aware service differentiation, admission control, QoS prioritization, and QoE-aware server/proxy and metadata adaptation.
  4. DASH-specific transport optimizations over heterogeneous network environments, where content is delivered over multiple access networks such as WWAN unicast (e.g., 3GPP packet-switched streaming [3]), WWAN broadcast (e.g., 3GPP multimedia broadcast and multicast service [19]), and WLAN (e.g., WiFi) technologies.

Notes

References

  1. ISO/IEC 23009-1: ‘Information technology – dynamic adaptive streaming over HTTP (DASH) – Part 1: Media presentation description and segment formats.’
  2. 3GPP TS 26.247: ‘Transparent end-to-end packet switched streaming service (PSS); Progressive download and dynamic adaptive streaming over HTTP (3GP-DASH).’
  3. Sodagar, I., ‘The MPEG-DASH standard for multimedia streaming over the Internet.’ IEEE Multimedia, Oct/Dec, 2011, 62–67.
  4. Stockhammer, T., ‘Dynamic adaptive streaming over HTTP: Standards and design principles.’ Proceedings of ACM MMSys2011, San Jose, CA, February 2011.
  5. Oyman, O. and Singh, S., ‘Quality of experience for HTTP adaptive streaming services.’ IEEE Communications on Magnetics, 50(4), 2012, 20–27.
  6. ISO/IEC 23009-2: ‘Information technology – dynamic adaptive streaming over HTTP (DASH) – Part 2: Conformance and reference software.’
  7. ISO/IEC 23009-3: ‘Information technology – dynamic adaptive streaming over HTTP (DASH) – Part 3: Implementation guidelines.’
  8. ISO/IEC 23009-4: ‘Information technology – dynamic adaptive streaming over HTTP (DASH) – Part 4: Segment encryption and authentication.’
  9. ITU-T Recommendation H.222.0|ISO/IEC 13818-1:2013: ‘Information technology – generic coding of moving pictures and associated audio information: Systems.’
  10. ISO/IEC 14496-12: ‘Information technology – coding of audio-visual objects – Part 12: ISO base media file.’
  11. 3GPP TS 26.234: ‘Transparent end-to-end packet switched streaming service (PSS); Protocols and codecs.’
  12. 3GPP TS 26.244: ‘Transparent end-to-end packet switched streaming service (PSS); 3GPP file format (3GP).’
  13. 3GPP TR 26.938: ‘Improved support for dynamic adaptive streaming over HTTP in 3GPP.’
  14. Akhshabi, S., Begen, A.C., and Dovrolis, C., ‘An experimental evaluation of rate-adaptation algorithms in adaptive streaming over HTTP.’ Proceedings of Second Annual ACM Conference on Multimedia Systems, San Jose, CA, 2011, pp. 157–168.
  15. Ramamurthi, V. and Oyman, O., ‘Link aware HTTP adaptive streaming for enhanced quality of experience. Proceedings of IEEE Globecom, Atlanta, GA, 2013.
  16. Ramamurthi, V. and Oyman, O., ‘Video-QoE aware radio resource allocation for HTTP adaptive streaming.’ Proceedings of IEEE ICC, Sydney, Australia, 2014 (to appear).
  17. Shokrollahi, A., ‘Raptor codes.’ Digital Fountain, Technical Report DR2003-06-001, June 2003.
  18. Luby, M., Shokrollahi, A., Watson, M., and Stockhammer, T., ‘Raptor forward error correction scheme for object delivery.’ RFC 5053 (proposed standard), IETF, October 2007.
  19. 3GPP TS 26.346: ‘Multimedia broadcast/multicast service (MBMS): Protocols and codecs.’ Third-Generation Partnership Project (3GPP), 2011. Available at: http://www.3gpp.org/ftp/Specs/archive/26series/26.346/.
  20. Luby, M., Shokrollahi, A., Watson, M., Stockhammer, T., and Minder, L., ‘RaptorQ forward error correction scheme for object delivery.’ RFC 6330 (proposed standard), IETF, August 2011.
  21. Afzal, J., Stockhammer, T., Gasiba, T., and Xu, W., ‘Video streaming over MBMS: A system design approach.’ Journal of Multimedia, 1(5), 2006, 23–35.
  22. Paila, T., Walsh, R., Luby, M., Roca, V., and Lehtonen, R., ‘File delivery over unidirectional transport.’ RFC 6726 (proposed standard), IETF, November 2012.
  23. Bouras, C., Kanakis, N., Kokkinos, V., and Papazois, A., ‘Evaluating RaptorQ FEC over 3GPP multicast services.’ 8th International Wireless Communications & Mobile Computing Conference (IWCMC 2012), August 27–31, 2012.
  24. Kumar, U., Oyman, O., and Papathanassiou, A., ‘QoE evaluation for video streaming over eMBMS.’ Journal of Communications, 8(6), 2013, 352–358.

Acronyms

3GPP

Third-Generation Partnership Project

AL-FEC

Application Layer Forward Error Correction

ARQ

Automatic Repeat Request

AVC

Advanced Video Coding

BLER

Block Error Rate

BMSC

Broadcast Multicast Service Center

BS

Base Station

CDF

Cumulative Distribution Function

CQI

Channel Quality Information

DASH

Dynamic Adaptive Streaming over HTTP

DECE

Digital Entertainment Content Ecosystem

DLNA

Digital Living Network Alliance

DM

Device Management

DRM

Digital Rights Management

eMBMS

Enhanced MBMS

FLUTE

File Delivery over Unidirectional Transport

GMR

Gradient with Minimum Rate

HARQ

Hybrid ARQ

HAS

HTTP Adaptive Streaming

HbbTV

Hybrid Broadcast Broadband TV

HTTP

Hypertext Transfer Protocol

IETF

Internet Engineering Task Force

IP

Internet Protocol

IPTV

IP Television

ISOBMFF

ISO Base Media File Format

LTE

Long-Term Evolution

MBMS

Multimedia Broadcast and Multicast Service

MBSFN

Multicast Broadcast Single-Frequency Network

MCS

Modulation and Coding Scheme

MOS

Mean Opinion Score

MPD

Media Presentation Description

MPEG

Moving Picture Experts Group

NAT

Network Address Translation

OIPF

Open IPTV Forum

OMA

Open Mobile Alliance

PCC

Policy Charging and Control

PDP

Packet Data Protocol

PF

Proportional Fair

PFBF

Proportional Fair with Barrier for Frames

PSNR

Peak Signal-to-Noise Ratio

PSS

Packet-Switched Streaming Service

PVQ

Perceived Video Quality

PVR

Personal Video Recorder

QoE

Quality of Experience

QoS

Quality of Service

RAN

Radio Access Network

RLC

Radio Link Control

RTP

Real-time Transport Protocol

RTSP

Real-Time Streaming Protocol

SDU

Service Data Unit

SINR

Signal-to-Interference-and-Noise Ratio

SSIM

Structural Similarity

TCP

Transmission Control Protocol

TR

Technical Report

TS

Technical Specification

UE

User Equipment

URL

Uniform Resource Locator

WiFi

Wireless Fidelity

WLAN

Wireless Local Area Network

WWAN

Wireless Wide Area Network

W3C

World-Wide-Web Consortium

XLink

XML Linking Language

XML

Extensible Markup Language

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset