STM Publishing’s Identity Management Problem

Most of the talk about copyright and technology issues in the world of scientific, technical, and medical (STM) publishing these days focuses on two issues: Open Access and Sci-Hub. For STM publishers, these represent the Scylla and Charybdis of losing control over copyrights. Open Access is about replacing paywalls and traditional copyright licensing with Creative Commons terms (we make it available to you without logins); Sci-Hub is emblematic of piracy (you get it elsewhere without permission).

The music industry was confronted with the same two extremes (more or less) and found success in a middle ground: getting users to pay for services that provide access to just about all content and are more convenient than piracy. The STM publishing industry has been taking steps towards the same goal. But they’re baby steps at best. STM publishers are stymied by a single structural fact about their publishing environment: there is no good way to manage users’ identities as there is for music or trade ebooks, nor is there likely to be in the near future.

The latest piece of evidence of the structural limitations of the STM world is the recent announcement, by several of the largest STM publishers, of a consortium called GetFTR (Get Full Text Research). GetFTR plans to introduce a new standardized way to help researchers get access to publishers’ content. But as we’ll see, GetFTR falls far short of the seamless identity management scheme that the publishers really want.

The problem that GetFTR purports to solve is the lack of consistency and reliability of authenticating yourself to a publisher when you want to read a journal article. This problem originally arose at pharmaceutical companies, which are major consumers of STM journal content. Many pharma companies still use a technique for authenticating users that dates back twenty years: IP address ranges. When a user logs in, the publisher checks the computer’s IP address to see if it is within a range provided by the corporate licensee. This technique was OK in the 1990s when everyone’s PCs in corporate networks were on fixed IP addresses, but it has many limitations now: IP addresses can vary with dynamic IP addressing; companies can change network architectures (particularly when they are bought and sold); users can’t log in if they aren’t on their office PCs.

A related problem for academic users is multiple identity providers. Researchers often have more than one institutional affiliation, such as doctors who teach at a medical school and work at a separate hospital or lab. Sometimes one institution subscribes to a journal while the other doesn’t. In that case, the user is often confronted with a “How would you like to authenticate yourself?” message; she will click on one choice only to be told that there is no valid subscription, then try another.

The other problem is that each publisher has its own way of asking for credentials. For example, in my research I typically use sites like IEEE (Xplore), Elsevier (ScienceDirect), and Springer (SpringerLink); each has its own authentication process with different look and feel, and I have to enter a different username and password for each one. The closest I get to seamless identity management is when using my membership in the New York Public Library to get access to databases such as EBSCO and ProQuest, each of which requires me to enter the same login information to different web pages.

This does not make for a smooth user experience, and it’s certainly one reason why sites like Sci-Hub and ResearchGate are popular places to find scholarly content (price, of course, is the other main reason).

There is an obvious solution to this problem, but the STM ecosystem isn’t likely to provide it: a single point of authentication that gives access to all (or substantially all) content. This is how it’s done in music and trade book publishing: you log in to Spotify, Apple Music, Amazon Music Unlimited, Tidal, Deezer, etc., and you have access to substantially all recorded music; you log in to your Kindle, Kobo, Nook, or Apple Books app and you have access to substantially all ebook titles.

For this solution to be feasible in STM publishing, it’s not necessary for there to be a Spotify or Kindle for journal articles, i.e., a single repository of authorized STM journal content (a/k/a a legal Sci-Hub). It would just be necessary to have a single identity provider which would integrate seamlessly with each publisher’s authentication system. A closer analogy would be the UltraViolet “rights locker” system that the major movie studios launched in 2010 and then abandoned last year: users would establish accounts with the UltraViolet system; participating retailers would handle transactions and fulfill content; and once you’ve bought your content from one retailer you can get it in another format from another retailer for free or a small upcharge.

The problem is that there is no central identity provider for STM content, and many people don’t want such a thing to exist–or at least don’t want their competitors to get the job. The leading candidates for this role right now include ResearchGate and Mendeley, which are both combinations of research portals and (for lack of a better term) social networks for researchers–sort of like LinkedIn for researchers.

ResearchGate is an independent company that hoovers up academic research content through questionable social-engineering tactics: it appeals to authors’ egos by telling them how important their work is and how valuable it would be if the world could only see their articles. It invites them to upload the full text, whether or not they own the copyrights (which typically publishers do and authors don’t), and makes it available in a central repository. The same STM publishers that sued Sci-Hub have sued ResearchGate for copyright infringement.

Mendeley doesn’t host unauthorized content in a central library as ResearchGate does, though it does support private sharing of content among members of small groups. But Mendeley is owned by Elsevier, the largest STM publisher in the world (Elsevier acquired it in 2013). Elsevier competitors such as Wiley and SpringerNature don’t want it to be the de facto repository for researchers’ identities. Beyond that, some people–including many universities–are more generally concerned about privacy implications if any single entity controls or has access to a critical mass of researchers’ identities. We’ve all seen what happens when a company like Google, Facebook, or Microsoft aggregates internet identities.

Yet without a single identity hub, it’s hard to see how access to scholarly content can be made truly seamless. The academic publishing community has known about this problem for several years and has been working on solutions through federated identity schemes, which map individuals’ identities to multiple identity providers. There is a widely-used personal identifier scheme for researchers called ORCID, but no built-in way to link your ORCID ID to your institutional affiliations (you have to link them by hand). Several independent nonprofit federated identity managers exist, such as InCommon.org, but they aren’t universal.

Standards work on federated identity is also underway; a current example is RA21, a project driven by the International Association of Scientific, Technical, and Medical Publishers (STM) trade association and the National Information Standards Organization (NISO) standards body. Federated identity is a tough problem to solve in a truly seamless and universal way; an attempt at federated identity for Internet users, the Liberty Alliance, collapsed under its own weight in 2009 as centralized identity managers like Google and Facebook were rising.

This brings us to GetFTR. GetFTR arose out of RA21, but it’s not actually a federated identity management scheme. Instead, it provides a one-time “identity provider discovery experience,” with a standardized look and feel, that enables users to store their preferred identity providers in their browsers so that they don’t have to remember their choice of identity provider later. For example, it might establish that NYU is your identity provider; then later on, you can simply log in to NYU’s single sign-on (SSO) system directly instead of being asked to choose an identity provider and then log on to it every time you want to access journal content. But GetFTR doesn’t store the identities themselves; you have to supply them every time.

To understand this, compare it to the identity and authentication experience that you get through Google, Facebook, or Twitter. Many websites allow you to sign up for them using your Google, Facebook, or Twitter IDs instead of establishing identities directly with them. If you want to sign up, you simply click (for example) “Sign Up with Google”; your browser knows your Google ID, and the rest is seamless.

GetFTR doesn’t go that far. And it has its own limitations, notably that it only applies to one browser on one device (you’d have to set it up for every browser on every device you use) and you might have different preferred identity providers in different situations.

The tradeoff for websites that use Sign Up with Google/Facebook/Twitter is that they chose a quicker path to signing up users over control of their identities. It’s a tradeoff that no one in STM publishing wants to make. Until that happens, and to the extent that publishers don’t embrace Open Access (another subject for another day), the STM publishing community is likely to be stuck in a world where the easiest access to their content is through unauthorized means.

Hat tips to Heather Flanagan of SeamlessAccess.org  and Joe Esposito of Clarke & Esposito for help with this. 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: