As more and more communications moved to IP (Internet Protocol), new applications like WebRTC (Web real-time communications) expanded the capabilities of traditional VoIP (Voice of Internet Protocol), and Plain Old Telephone Service (POTS) began to go the way of the dodo, the need for an audio codec designed for the extremely dynamic nature of today’s packet-switched Internet emerged. As with most foundational Internet technologies, there was a strong desire to keep the codec royalty free. To that end, the IETF formed the CODEC working group and in 2012 published RFC 6716, which standardized the Opus audio codec. Today, Opus is seeing success in the market specifically because of both these characteristics and a high-quality, open-source reference implementation that keeps improving over time.
Opus began as two unrelated audio coding projects. Skype began the SILK codec in 2007 as a variable-rate speech codec for narrowband to super-wideband speech. Almost at the same time, CELT was being created by Xiph.Org contributors as a high-quality, ultra-low delay audio codec aimed at the most demanding interactive audio applications. The two complementary technologies were combined in 2010 as part of the IETF audio codec effort started one year earlier.
The creation of an audio codec working group within the IETF was subject to much controversy. While Opus is not the first codec to be stamped by the IETF, it is the first to be developed within a dedicated IETF working group and published on the standards track. Described as “one of (if not the) most technically complex pieces of work that has been presented to the IETF” by its Gen-Art reviewer, the effort also raised new issues within the IETF, such as how to specify a standard as C code.
To cover a wide range of network conditions, Opus supports a wide array of quality and bitrate options:
- Bitrates from 6 kbit/s to 510 kbit/s
- Narrowband (8 kHz) to fullband (48 kHz) audio
- Frame sizes from 2.5 ms to 60 ms
- Speech and music support
- Mono and stereo
- Flexible rate control
As network conditions change, all of the abovementioned settings may be dynamically changed in real time without causing audible artifacts or other glitches. Its rate control can generate constant bitrate (CBR) streams, such that each packet is exactly the size requested, or variable bitrate (VBR) streams, which target a specific quality, optionally constrained to impose a bound on required buffering or to respect an absolute maximum rate. All of this makes Opus suitable for almost all audio applications, including:
- VoIP and videoconferencing (e.g., WebRTC)
- Music streaming
- Music files and audiobooks
- Low-delay broadcast reporting
- Wireless audio equipment
- Network music performance
In addition to making things easier for implementers—one codec can deliver best-in-class performance where five or six different codecs would have been required before—Opus’s wide range of applications also helps reduce transcoding when linking different applications (e.g., streaming a videoconference).
To justify these claims and verify that Opus met its requirements, independent testers compared it to other speech and music codecs. Among these tests, a wideband/fullband speech test conducted by Google found that Opus provided better quality at equal rate than G.719, Speex, G.722.1, and AMR-WB (Adaptive Multi-Rate Wideband).
Figure 1. Google Wideband/Fullband Test
In a test  by HydrogenAudio, Opus outperformed Vorbis and both the Nero and Apple HE-AAC encoders on 64 kbit/s music.
Figure 2. HydrogenAudio Test
The results of these tests prove that Opus delivers better quality than previous state-of-the art music codecs while maintaining the low delay of communications codecs. As interest in Opus has grown, more organizations have conducted more tests, including the European Broadcasting Union which expects to release the results of a set of listening tests in the near future.
Despite being standardized only last year, Opus is already being adopted in many VoIP and videoconferencing clients. Along with G.711, it is mandatory to implement for the new WebRTC standard, which was used to broadcast the technical plenary (on Opus) at IETF 87 (using Opus). Tieline and vLine use Opus to deliver broadcast contributions. Real-time communications clients, including Jitsi, Meetecho, CounterPath, SFLphone, Mumble, Teamspeak, and many others support Opus.
Opus is also being adopted as a music-streaming and music-storage format. It can be used with the HTML5
Opus developers are currently focused on releasing version 1.1 of the Opus implementation. This will be the first major release since version 1.0 was published alongside the RFC. Version 1.1 will include quality improvements that are possible because RFC 6716 only specifies the Opus decoder, thereby allowing smarter encoders in the future. These improvements include much-improved support for surround audio, and bitrate allocation tuning to enable more-uniform audio quality and to lower the average rate required to avoid noticeable artifacts.
Another important feature in 1.1 is the ability to automatically detect whether an input signal is speech or music and adapt the encoding process accordingly. Although the reference implementation could switch between modes, it relied on the user to identify whether the input was speech or music.
At the IETF, the focus is now on encapsulating Opus in both RTP and Ogg with two active working group drafts[15,16]. Encapsulating Opus is straightforward because it signals all mode changes in-band—no out-of-band signaling is required. The SDP (Session Description Protocol) codec parameters carry only informative parameters, which almost completely eliminates the possibility of negotiation failure. In fact, an RTP receiver can correctly decode an Opus stream without ever seeing the SDP.
Next Steps: Video
Watch for the Daala project, a competitive, royalty-free video codec, based on “new” (for video codecs) technology, including lapped transform, frequency-domain intraprediction, and vector quantization.[18, 19, 20]