Watermarking and Blockchain Challenges

Last time I talked about blockchain technology and its potential to revolutionize certain types of copyright-related transactions.  Now let’s talk about some challenges.  Even though the potential for blockchain applications in the copyright arena is high, it’s inevitable that many of the schemes being proposed will not pan out; that’s the nature of technology hype.

Some of the schemes I’ve heard about involve using blockchain to manage distribution of content files instead of just identifiers and rights metadata.  This just isn’t possible.  Blockchains are distributed ledgers, meaning that every participant should have the same set of data.  If you’re going to (for example) transfer ownership of a file from one party to another, then you can only store information about the transfer in the blockchain: what was transferred and under what terms and conditions. Files are the assets being transferred; just because they are digital doesn’t mean it makes sense to store them in a blockchain.

For example, there have been suggestions that blockchain provides a solution to the problem of digital first sale. The theory is that you should be allowed to resell (or give away) your digital files if there is a way of proving that once you have transferred your files to the buyer, you no longer have them. Back in 2001, the U.S. Copyright Office stated that this requires a reliable “forward and delete” technology, but since such a thing doesn’t exist, the law should not be changed to recognize digital first sale. Earlier this year, the Patent and Trademark Office concluded that this situation remains the same.

Because the blockchain can only store information about a resale transaction, there’s normally no way of ensuring that the reseller (or giver-away) hasn’t stashed away copies of files before the transaction takes place. In other words, the fact that transactions are recorded in a blockchain doesn’t help make the system more trustworthy or diminish the opportunities for abuse.

Yet there is a way to provide some assurance about digital files that are involved in blockchain-based transactions, by establishing robust links between the transactions and the files.  The simplest form of this is to use a unique content identifier in the transaction (which gets stored in the blockchain) and embed that same identifier in the content file itself. This is analogous to registries of ownership of physical assets — such as cars, which have standard 17-digit Vehicle Identification Numbers (VINs). When you sell your car, it’s registered by VIN to the new owner in a government-controlled vehicle registration database.

This requires two things: a unique identifier scheme, and a way to ensure that the identifier travels permanently and unalterably with the file. Unfortunately, because files are digital data, this is easier said than done.  Simply putting identifiers in file headers is no good, as header data is easy to change, delete, or ignore.  This is the bane of the stock image industry’s existence, and anyone who has used Google Image Search to find photos, company logos, or other graphics for use in their PowerPoint presentations should understand the problem.

The best way of binding identifiers to digital content is with digital watermarks — data that’s embedded in digital content files in such a way that the data survives normal transformations (cropping, downsampling, format conversion, etc.), does not affect the sound or appearance of the content to humans, and is difficult to remove from the file without damaging its sound or appearance.

The data capacity of a robust watermark is limited to a few dozen bytes, so it’s well suited to content identifier schemes such as ISRCs in music and ISANs and EIDRs in audiovisual content.  Watermarks for images, audio, and video can be very hard to remove without damaging the sound or appearance of the content, while the same is not true for e-book watermarks.

Watermarks have other advantages besides robustness.  Multiple files can exist that contain the same content but different identifiers, which could (for example) denote different rights granted on the same content.

The big disadvantage of watermarks is that someone has to put them there in the first place. If a file is already “in the wild,” it’s not possible to watermark. This has led to the use of fingerprinting or automated content recognition techniques, such as the Content ID scheme that Google uses for YouTube, the Audible Magic scheme that the Copyright Alert System (among others) uses to detect alleged piracy on file-sharing networks, or Shazam’s popular “name that tune” mobile apps. Fingerprinting takes “educated guesses” at the identity of content by distilling essential characteristics of it and looking those characteristics up in a database.  This doesn’t require any pre-processing of files, but it isn’t 100% accurate in identifying content, and it will assign the same identifier to all copies of a given content item.

In addition, watermarking requires specific algorithms and other technological components to embed, detect, and extract data from files. Such schemes usually cost money and aren’t open source, and they are incompatible with one another. Finally, watermarking will only succeed if it’s accompanied by online registries of rights information that are accessible by standard identifiers. Most standard identifiers have associated online registries, but they don’t store rights information.

Still, watermarking seems like the best way to establish accountability in blockchain-based transactions that extends to media files themselves.  So far none of the burgeoning blockchain initiatives I’ve heard about are embracing watermarking; we’ll see if that changes — or if any of the watermarking technology vendors develop blockchain-friendly offerings.

Leave a comment