In a recent exchange in the comments to an article in Nate Hoffelder’s The Digital Reader blog, I mentioned Readium Licensed Content Protection (Readium LCP), the standard DRM scheme for EPUB-formatted e-books that I’ve been working on for the past couple of years. I gave a talk on Readium LCP a couple of weeks ago at the EPUB Summit in Bordeaux, France, so that the basic outline of the scheme is now known to the public.
I used the phrase “open source DRM” to describe Readium LCP in The Digital Reader. To this, one commenter remarked, “… open-source DRM sounds amazing. It will be so much easier to break if we know exactly how it works.” Let me explain why this isn’t necessarily the case, and in doing so, explain what we really mean by “open source DRM” and how that fits into the infrastructure of Readium LCP.
The objective of a DRM implementation is to keep hackers from discovering details that would enable them to find the unencrypted (cleartext) content or the keys that they can use to decrypt the content. To do this, the hacker needs to analyze the code that runs on the client device. That’s machine code, not source code.
If source code is compiled to machine code in a straightforward way, then knowing the source code certainly does go a long way towards knowing where to look for keys or content in the machine code. But what if the process of producing machine code from source code were unpredictable and varied widely across different deployments of the same DRM? In that case, knowing the source code wouldn’t help much.
Yet that’s exactly how many DRM and conditional access schemes are implemented nowadays. The implementer obfuscates the client code at or after compile time, so that the actual machine code doesn’t reflect the source code. Even better, the implementer does this in a variety of different ways across individual devices or apps. That way, not only is the DRM on a particular device or app hard to reverse engineer, but even if someone finds a technique for discovering keys or cleartext content on one device, that technique may not work on another device using the same DRM.
Code obfuscation is like a one-way function: you can obfuscate the code using known techniques, but it’s impossible to recover the original code from the obfuscated code. The more sophisticated types of code obfuscation used in today’s DRMs involve techniques such as code diversity (code is obfuscated in N different ways, not just in one way for all implementations) and individualization (a unique device ID is used as input to a method that individualizes code to each device).
In addition, implementers can add functionality to client code that monitors for activity (such as the use of debuggers) that suggests reverse engineering or code tampering, and shuts down the device or takes other corrective action.
This process is known as application security or code hardening. (For more on this, see my white paper on pay TV content security.) There are various free tools available for simple forms of code obfuscation, and there’s a cottage industry of companies that offer more sophisticated tools and techniques for application security, such as Irdeto, Intertrust, and Arxan.
DRMs that are licensed to implementers (as opposed to DRMs that vendors implement by themselves) usually come with sets of rules, called robustness rules, that state requirements for code hardening. This is the case with Hollywood-approved DRMs such as PlayReady (Microsoft), Widevine (Google), and Marlin (Intertrust et al). Implementers must agree to comply with robustness rules as a condition of licensing the DRM, and in some cases, third-party auditors may be required to examine implementations to make sure that they comply. The robustness rules apply to the entire client application, not just to the DRM routines — because the DRM code passes cleartext content to the rendering code (such as an e-book reading system or video player), which must be trusted as well.
If the client code is hardened sufficiently, then it matters much less that the source code for the DRM is known to the public. On the contrary, open source can increase the efficiency with which some security holes can be plugged once they are discovered in the wild. Open source also means that even the source code of different implementations can differ, as long as they are all compliant with the spec.
And that’s how we have set up Readium LCP. Already there are a few different client implementations of the Readium LCP spec. There’s a GitHub repository, and when Readium LCP is released to the public — expected by the end of this year — a reference implementation will be declared and the code will be made available to anyone under an open source license such as BSD or MIT. The Readium LCP client license agreement will include a set of robustness rules, which we have hired Farncombe, a leading provider of content security services (including robustness audits), to develop.
Like some other DRMs, Readium LCP is an ecosystem of interoperable components. The goal is that anyone with any e-reading app or device that supports Readium LCP should be able to read any e-book from any distributor (retailer, library, subscription service, etc.) that supports Readium LCP. Readium LCP also has features for enabling user-controlled sharing of titles among friends and colleagues, who may be using different e-reading apps. Readium LCP integrates with Readium, the open source e-reader code for EPUB3; Readium is designed so that other DRMs can integrate with it as well.
To become part of this interoperable ecosystem, you also have to obtain cryptographic material that enables your app to be recognized as part of the ecosystem and to decrypt content that has been encrypted on a server within the ecosystem. Signing the license agreement entitles you to receive these keys on the condition that you keep them secret.
On the server side, if you’re a distributor and want to implement Readium LCP, you will need to get a digital certificate, which verifies your identity as a server licensee; the certificates will be provided by the International Telecommunications Union (ITU), the keeper of the X.509 certificate specification, instead of a commercial certificate authority. Licensees will be able to modify the reference source code for their own purposes, as long as they pass tests for compliance with the Readium LCP specification — which will include detailed tests to insure interoperability.
So, by signing a license agreement for Readium LCP, you agree to meet robustness rules, keep keys secret, and pass compliance tests; in return, you join the interoperable ecosystem. If you don’t want to sign the license agreement, then you’re free to use the spec and/or reference code to develop your own interoperable ecosystem, with your own crypto keys… or not.
Readium LCP came about because while EPUB is a widely-used standard for formatting e-books, the EPUB standard does not include DRM. This has led to fragmentation in the market, which has limited its universality. The original idea was that Adobe’s e-book DRM (Adobe Content Server/RMSDK) would become the de facto standard DRM for EPUB implementations, but this has not happened. (For example, Apple’s iBooks uses EPUB but uses its own DRM.) Many publishers still require DRM for retail e-book distribution, and DRM is also required for library lending and other types of access models. The basic design for what is now called Readium LCP arose out of (among other things) many discussions between Bill McCoy, Executive Director of the International Digital Publishing Forum (the EPUB standards body) and myself.
Some in the EPUB community have also gotten increasingly frustrated with Adobe for its wavering commitment to its e-book DRM over the years and the series of technical and licensing gaffes it has committed. Readium LCP is designed to be free of dependencies on any commercial vendors.
The license administrator of Readium LCP is the European Digital Reading Lab (EDRLab), a nonprofit membership organization based in Paris that is working on various EPUB-related projects. I’m working with EDRLab as it finalizes the license agreements, compliance test suites, digital certificate protocols, and other components of the infrastructure. In this context, EDRLab is analogous to other independent DRM license administrators such as the Marlin Trust Management Organization (MTMO) for Marlin. EDRLab will charge fees to Readium LCP licensees, but only to recover costs, not to make a profit. Much of the adoption of Readium LCP is expected to be in Europe, where there is more variety and competition among e-book platforms than there is here in the United States.
There have been a few other attempts at open source DRM in the past, perhaps the best-known being OpenIPMP from Mutable. None of these have the licensing and cryptographic infrastructure that Readium LCP has. We designed Readium LCP to take many features of the licensing infrastructures of contemporary commercial DRMs and adapt them for use in a collaborative, open-source, nonprofit environment. Readium LCP will be simple to implement and inexpensive to license.
So, to sum up: Readium LCP doesn’t depend on the secrecy of source code for its security, so the fact that its reference code is open source should not be much of a factor. I suspect that the way many of today’s Hollywood-approved DRMs are licensed and implemented, knowing the source code might not be of much help in cracking them either. Instead, the security of Readium LCP depends on secrecy of encryption keys and the techniques used to obfuscate machine code, as well as on widely-accepted crypto algorithms. Will that be perfect security? No; there’s no such thing. Is it about as good as many of the existing DRMs for e-books that distributors use with titles from major publishers? We think so… though only time will tell for sure.