I will be speaking at the O’Reilly & Associates Tools of Change (TOC) conference on Tuesday February 10 in NYC. (TOC has quickly become the leading conference on publishing technology; it has filled the hole left by the demise of the lamented Seybold conferences.)
The panel is on Google’s lawsuit settlement with the publishing industry. John Kreisa, an executive from Mark Logic, the makers of XML server software, will be joining me. We will be discussing the settlement’s effect on online business models for book publishers and how publishers can build content infrastructures to take advantage. I’m publishing a white paper for the occasion; here it is.
The publishing industry’s litigation with Google was settled last October after three years. The litigation was all about copyright in the digital age. The settlement, in contrast, is all about a set of business models that publishers and Google intend to implement, both immediately and in the future.
The first set of business models (Section IV of the settlement agreement) is technologically straightforward: it’s essentially online e-book sales with contextual advertising. It’s books and page images, and it’s roughly the same thing as Amazon.com has been doing for years.
The far more interesting part is Section IV.7, where the parties list several hypothetical future business models. Some of them — such as custom publishing and compilations — require that Google use content that is logically structured and not just page images (hence the motivation for XML technology). These business models also require that the independent Book Rights Registry (BRR), which the settlement will establish with about US $30 Million from Google, keep track of rights to pieces of content that may be smaller than entire books.
This has a boatload of implications, and I wonder just how far Google has thought through them. The libraries that currently feed Google with text from scanned books can’t possibly supply the content that Google would need to implement these business models — i.e., logically structured content in XML. Publishers will have to supply it themselves. Many will not be able to do this without building new content infrastructure or engaging a service.
This is a big deal. The “other side” of my consulting practice is in helping media companies plan out such infrastructure, along with attendant process and organizational changes; it’s not a trivial thing to do.
An even bigger question is about motivation. The BRR can do business with any entity, not just Google, that wishes to offer services based on publishers’ content. So it could well be motivated to track rights to things other than entire books.
But will Google really go into the business of selling content at the level of chapters, sections of chapters, individual entries in reference publications, or other smaller units of content? This will put Google into the rather curious business of making money from certain types of digital content while not making the same money from others.
Google can rationalize its 30% revenue share on e-book sales because they’re “just books” (online versions of “legacy” print publications) and because Google is supplying the viewer application and (<ahem>) the DRM. But if Google starts selling content in standard online formats that can be viewed in web browsers or other widely available readers, then it starts to feel different.
Net neutrality begins to come to mind. Book content will have gone from being unfindable (legally) online to being content that Google is motivated to push as “premium.” Google may not actually cook its search engine rankings in order to favor content from which it makes direct revenue. But consider that the BRR can clear rights to the same content for any service provider that wants to use it. Google will be in competition with those service providers, which may have no “neutrality” issue.
Will Google be tempted to use “extra methods” to draw traffic to content licensed from book publishers? Or will it decide that it’s just not worth the effort to build the infrastructure necessary to launch these new content business models at all?
Well, if Google doesn’t want to, then someone else will. The beauty of the lawsuit settlement is that it envisions an online content management and rights clearance infrastructure that could make it easier than ever to launch new business models based on publishers’ content. If this happens, then fears of Google taking over the economics of the publishing industry may not be well-founded after all. Instead, the publishing industry will truly be the better for it.
Thanks for the preview of your TOC panel presentation (and for the kind words about the conference)–sounds like it’s going to be a useful and illuminating discussion.
strictly speaking, libraries are not providing text; they provide books, which google digitizes. (libraries also provide some metadata, which is combined by google with other metadata, some of which they license). google OCRs the images, and of course could generate all manner of XML data from that, including structural information. in fact, they already do this.
what publishers might provide for 2009+ books is another topic all together; there are many issues relating to whether the BRR might cover future works as well as those under the settlement.
Peter, thanks for writing.
Of course Google has to OCR the text in order to index it. And they can intuit basic structural information such as paragraphs, chapter headings and so on, which could be used to create basic XML tags. But Google would need involvement from publishers to be able to tag logical structure beyond that – for example, to identify index terms that could be combined with terms from material from other publishers to be used in custom publications for the higher ed market. I’m questioning whether Google would want to be bothered with what is ultimately a series of niche opportunities.
[…] Bill Rosenblatt, einer der besten Auskenner auf dem Gebiet des Digital Rights Management, einen Ausblick auf die Geschäftsmodelle, die mittels Googles Digitalisierunsstrategie möglich werden – und auch, wie Google sich durch […]
interesting analysis of this complex issue. Why do you think that the lawyers didn’t think the issue through – because the BRR can do business with everyone else and Google has to do the work (the structuring)?
Actually, there’s nothing that says that Google has to do the work. The beauty of the arrangement, as I said, is that if Google won’t do it, someone else can. Publishers can structure content themselves, or service providers can do it for them as part of an online content service they are offering, and the BRR can track usage.
Lawyers for publishers involved in the Google settlement are not business development or product management people; they aren’t going to weigh the costs and benefits of adopting certain business models. An ROI calculation was not a prerequisite to listing a particular business model in the settlement agreement. The future business models enumerated there are aspirational and non-binding.
Both sides have to decide if it’s worth the investment to create structured content with standard metadata. This is something that publishers have wrestled with for years, so it’s not a new problem to them. I just question whether Google — which pulls in orders of magnitude more revenue than most publishers — would consider the opportunities big enough to bother with.
[…] Rosenblatt has blogged about the white paper and about the settlement itself on his Copyright and Technology […]