VP8 and H.264 codecs are mandatory to implement to be webrtc compliant. Simulcast is a way to use multiple encoders at a time to provide different resolutions of the same media to chose from as a way to adapt to bandwidth fluctuations (and other good things). Unfortunately, while some patches were proposed some two years ago by some including HighFive, libwebrtc did not implement support for simulcast with the H.264 codec. H.264 was then a de-facto secondary codec, and Safari which only supported H.264, could not achieve the same level of adaptation (or quality) than VP8 and some other browsers could. This blog gives more details about the epic journey to get that done, the design of the implementation, and the impact for WebRTC products.
Funny enough, if you look at the atomic steps you need to go through to send media over the internet, wether real-time or not (i.e. wether webrtc or HLS), you will end up with the above diagram. The “codec” only really touches on the clear blue section, while each half of the line (capturer to internet, internet to display respectively) is referred to as the “media engine”. You can see that there is much more about streaming media than just the codec.
In a previous post we spoke a little bit about the difference between codecs and media engine, you can take another look if you really want to go into the details:
In the case of pre-recorded media, or very low latency / slow streaming, you can optimise this, and it starts looking like an usual HLS / CDN solution (see below). In those cases, the first half of the line is not time sensitive at all, and this leads to some asymmetry between the preparation of the media (up to the upload to a CDN), and the serving of the media which is more time-sensitive.
A lot of people trying to optimise the pre-recorded content streaming will address either one of those half.
For example, in the webrtc community, peer5, streamroot.io and others will address the second half by offloading some of the CDN-to-viewer content to a webrtc’s datachannel-based p2p network created on-the-fly between viewers of the same media.
Meanwhile, in the codec community, AOMedia’s AV1, and any new codec for that matter, will address the entire by trying to find new codecs with better compression ratio, reducing the overall bandwidth need. While AV1 has SVC capacity, it is likely that it will not be used for bandwidth adaptation in those case. At AOMedia, the Real-Time group is a sub-group of the Codec group, illustrating the order of the priorities.
A lot of companies in the blockchain / dApps / web3 ecosystem, are taking this Streaming model and try to optimise separately different segments: generation, transcoding, storage, distribution, …. While most aim at pre-recorded content, a.k.a. VOD (VideoCoin, theta, livepeer, viewly, …), only a few can do live (5s delay) streaming (livepeer, IPBC, …), and only one today is aiming at real-time (less than 1s, spankchain).
In any case, you end-up having the same overall problems. We will focus on the following two:
Before the avent of SVC codecs, the main solution was multiple encodings of the the same media at different resolutions on the sender side, and mechanism to chose which resolution would be used by the receiving side.
Streaming / Broadcast protocols like HLS or MPEG-DASH implemented this with transcoders, file-based chunking, and buffering, all which induced additional delay. Real-time protocols like Webrtc used “simulcast”.
Usual streaming design (HLS) encode once at the source, upload this high resolution media up to a transcoder, and then transcode into different segments of same duration, one for each resolution. Most of those streaming “live” through Youtube, Twitch, Dailymotion or any other streaming service use OBS-studio, which capture, encode and stream over flash to an ingress node, which in turn transcode.
Simulcast is doing exactly the same transcoders do for HLS, but directly at the source. That keeps things real-time (removing a decoding + encoding cycle in the process) at the cost of an overhead (CPU, upload bandwidth) on the sender side, as the source machine need to now have the same capacity an HLS transcoder otherwise has.
Simulcast is for many the minimum acceptable to have industry quality media. Jitsi for example decided not to support Sfari iOS in their official SDKs because of lack of simulcast support with H.264, and is using a native client instead. For those interested in all the details, or those who wants to hear it from Emil itself, the recording is public, and the interesting part is around 18 minutes in.
So far, libwebrtc, the webrtc media engine implementation used in chrome, firefox and safari, did not support simulcast in conjunction with the H.264 codec. Apple supporting only H.264, it was leading to a poorer experience with H.264 in general, and with Safari specifically.
While a patch for extending libwebrtc engine simulcast implementation to H.264 had been proposed by HighFive almost two years ago (and used in their electron native client for just as long), and while it’s likely that many had done the same thing in their native apps, it had never been merged. Time had passed to the point that the underlying C++ class changes had been too great to actually even reuse this patch.
Starting anew, from the VP8 simulcast implementation, CoSMo’s Media Server Lead Sergio Murillo that most of you must know from his article on SVC wrote a patch, and submitted it for review around the end of march. 3 months, 45 reviews with as many rebasing, and quite a few face-to-face meetings all around the globe later, we are happy to announce that the patch was merged today!
The main impact is that you will not have to choose between using the H.264 codec and having a good quality media engine anymore. Using chrome canary this week, and chrome stable in around 12 weeks, you will be able to send H.264 simulcast.
Firefox and edge have not commented on their intend to implement it. It is likely that firefox will adopt it soon, as their webrtc code uses libwebrtc media engine. Apple is planning to implement it during the web engine hackfest, first week of october, with CoSMo’s help.
Simulcast is a sender side feature used to address bandwidth fluctuation on the receiving side. That means you do not need any modification on the receiving side to support simulcast, only on the sender side, and possibly in your media server. That also means you do not need to wait until this is implemented in all the browsers, to benefit from it.
In the one-to-many streaming/broadcast use case, all you need is to make sure the sender uses chrome. There is no additional work to do to support any browser, including safari on iOS.
For the many-to-many video conferencing use case, where all the clients need to be able to send *and* receive, you will have to wait for all browsers of interest to fully support it. It’s likely that very fast most media server will upgrade their support for this (they already support simulcast for VP8), and allow one not to bother at all.
At CoSMo, we already made our media server Medooze compatible with the changes, and all our products, services, and our customers’ are going to be upgraded before the end of the month, and tested right away thanks to KITE, the same tool that tests WebRTC implementation in all browsers today, and which is used by callstats.io to validate stats implementations in all browser, for example.
If you too, you want to get an edge over your competitors, use cosmo expert services, and/or products.
At this day and age where everybody and their mom claim to be WebRTC experts (and fearless visionary leaders), remember that talk is cheap, and ask to see or test the code!