11/23/2020 | News release | Distributed by Public on 11/23/2020 08:09
HTTP Adaptive Segmented (HAS) streaming began to be used at scale from 2008 to 2012, with the advent of Move Networks, Microsoft Smooth Streaming, Apple HLS, Adobe HDS, and MPEG DASH. With the typical 10s segment durations of the day, livestream latencies (measuring latency as the time from an action being filmed to that same action being displayed on a device's screen) remained in the 30s to 60s range, trailing broadcast by a significant degree. Over the next decade, segment durations were reduced down to 2s, bringing with them a concomitant reduction in latency to the 8s to 16s range. That range remains the typical latency for many live events today. The year 2020 then brought the industry a pleasant surprise -- not one, but two HAS standards were released that target latency in the 2s range: Low Latency DASH (LL-DASH) and Low Latency HLS (LL-HLS). Both these standards were developed independently, and while they can be deployed as separate streams in a content delivery system, there are performance and cost gains to be had for packagers, origins, CDNs, and players if both streaming formats can be served by a single-set of media objects.
The HLS specificationwas updated to describe version 10 of the streaming protocol. Among the many improvements, LL-HLS introduces the notion of partial segments ('parts'). Each part can be addressed discreetly via a unique URL, or optionally as a referenced byte-range into a media segment. The vast majority of early implementations have focused on the discreet part-addressing mode. However, range-based addressing brings with it several performance advantages, along with a path to interoperability with LL-DASH solutions and increased CDN efficiency. It also harbors some curious requirements for implementation across general purpose proxy caches.
This article will investigate the problems we can solve with range-based addressing, the requirements it brings to operate effectively, and the benefits we can gain by deploying it at scale.
Let's start by examining cache efficiency at the edge when faced with a mixture of low latency and standard latency HLS and DASH clients, all playing the same content. Caching is the means by which CDNs scale up HTTP-adaptive streams. The more content can be cached, the better the performance and the lower the costs. If we imagine an LL-HLS stream with 4s segments and 1s parts, Figure 1 shows all the objects that will need to be cached at the edge within a 4s window. There are many of them! Some are larger than others and we can highlight this difference by scaling them graphically such that the area is proportional to the size. Figure 1 shows that the video segments take up the largest amount of space.
Notice there is duplication in content between the parts (purple), which are consumed by a low latency client playing at the live edge, and the contiguous media segments (green), which are consumed by standard latency clients, or low latency clients scrubbing behind the live edge. If we were to add in the DASH footprint, we would see in Figure 2 that we have three silos of files, all holding the same media content, yet competing with one another for cache space.
Our goal is to reduce these down to a single silo. This will lower origin storage by a factor of 3 and also triple the cache efficiency for the CDN. This can be achieved through the use of byte-range addressing.
Within an LL-HLS media playlist, a part is described discreetly using a unique URL for every part. For example
This same part can alternatively be described using the BYTERANGE syntax
which specifies the length and offset at which a part is located within a media segment. For PRELOAD HINT parts, for which the last-byte-position is not yet known, only the start of the byte range is signalled:
Figure 3 shows a discreet part playlist on the left, and it's byte-range-addressed equivalent on the right:
Of particular interest to us is the expected origin behavior when faced with the open range request specified by the PRELOAD HINT entry. According to the HLS spec, 'When processing requests for a URL or a byte range of a URL that includes one or more Partial Segments that are not yet completely available to be sent - such as requests made in response to an EXT-X- PRELOAD-HINT tag - the server MUST refrain from transmitting any bytes belonging to a Partial Segment until all bytes of that Partial Segment can be transmitted at the full speed of the link to the client.' This means that the origin must hold back beginning the response until all the bytes of that preload part are available. But what then? The spec continues: 'If the requested range includes more than one Partial Segment then the server MUST enforce this delivery guarantee for each Partial Segment in turn. This enables the client to perform accurate Adaptive Bit Rate (ABR) measurements.' Since our open range request does include more than one part (in fact, it includes all the remaining parts of that segment), the origin should continue to return successive parts down the same response, bursting each part as it becomes fully available. The key point here is that that single request will in fact return all the parts remaining in that segment. Figure 4 illustrates how we can use this fact to derive a common workflow between LL-HLS and LL-DASH.
The lower half of Figure 4 represents the workflow for a client using byte range addressing. At time 0, it makes an open-ended range request against segment 1. The origin blocks the response until the entirety of part 1 is available and then it begins an aggregated response back to the client. I use the term 'aggregated' carefully here. If this were http/1.1, it would be a chunked transfer response, however since LL-HLS mandates the use of http/2, and http/2 has framing, this is simply an aggregating http/2 response. Notice that bytes are injected into the byte-addressed response at the exact same time as they are released down the wire for the discreet-addressed parts. The two approaches are latency-equivalent. Also, importantly -- the aggregating response in the byte-addressed case is exactly what an LL-DASH client is expecting. DASH clients do not have the constraint that the part (or 'chunk' in their context) must be burst, but this bursting does not hurt them and in fact it helps considerably with their bandwidth estimation.
Let's examine the start-up behavior of a byte-range-addressed LL-HLS client. Consider a client faced with the media playlist at start-up (post tune-in) in Figure 5.