Scholarship has bigger fish to fry than access

Oct14

Scholarship has bigger fish to fry than access

In: science politics • Tags: infrastructure, mandates, publishing

Around the globe, there are initiatives and organizations devoted to bring “Open Access” to the world, i.e., the public availability of scholarly research works, free of charge. However, the current debate seems to largely miss the point that human readers (there is still a problem for content mining) are already enjoying such public access for the huge majority of scholarly works since about 2013, due to several technical developments providing temporary work-arounds for publisher paywalls.

For various reasons, people (that includes many long-time OA activists) still publicly claim we need open access, when all we need is a different kind of open access to the one we currently already enjoy. The core of the access problem itself has actually been solved for the last 6-7 years, but (likely) only temporarily.

Of course, this realization dramatically changes the whole issue. For the last 6-7 years, paying for subscriptions has ceased to be necessary for access. One sign of the changing times is the support that initiatives such as DEAL, Bibsam etc. have: two years without subscriptions to Elsevier and what do you hear out of, e.g., Germany? Crickets! Nothing! Of course, it would be silly to conclude that in these two years nobody in Germany has read any Elsevier articles. The reason for the silence and the continued support for DEAL is that we now can access anything we want without subscriptions. The old adage that “everybody who needs access has access”, wrong prior to 2012 because of subscriptions, is now finally true despite subscriptions! DEAL et al.’s strong negotiation position would not have been possible, or even thinkable prior to 2012 and the single reason is that subscriptions have been rendered redundant for access.

But not only has the access problem decreased in size and prevalence dramatically, other problems have since surfaced that loom much larger than access, even if there had been no such technical developments.

The reliability of the scientific literature appears to be much lower than expected – and what use is an unreliable literature, accessible to the public? In particular, the publish-or-perish culture centered around journal rank is set to reward unreliable science and punish meticulous scientists, contributing a major socioeconomic driver for what some already call a replication crisis.
Moreover, with the advent of APC-OA, the problem of affordability has come to the fore also for scholars, when before it was largely a libraries’ problem. Publishing costs of under 500€ an article, but prices of more than twenty times that (e.g., Nature branded journals and others) scare the scholarly community: in the future, will only rich labs/individuals/institutions be able to afford publishing in the prestigious journals without which nobody can survive in academia? Given that subscription costs are largely opaque and subscriptions themselves no longer necessary, of course there is huge resistance to something that is bound to make things worse from not only the point of view of authors. Not surprisingly, people have a hard time understanding why such change is needed.
Finally, while billions are being spent on subscriptions that nobody needs any more, hardly anything is spent on the kind of infrastructures that are crucial for our work: databases, code-sharing sites, etc. Scholarship lacks even the most basic functionalities for some of the most crucial fruits of our labor: text, data and code.

The main issues any modernization of the scholarly infrastructure today needs to address are thus comprised by the RAF crisis: Reliability, Affordability and Functionality. Approach them with a modern infrastructure solution and the kind of access to the scholarly literature we currently enjoy will be perpetuated as a side effect.

In Europe, the lack of functionalities has been acknowledged, in particular for research data and now, slowly, also for scientific code and software. In part, the European Open Science Cloud (EOSC) is intended to address these problems. However, what we need is not a piecemeal hodgepodge of stand-alone computational solutions (which is the direction EOSC appears to be headed right now), we need to have a seamless infrastructure where we can integrate data and code into our texts. And this is where scholarly publishing can’t be seen as a standalone problem any longer, but as an integral part of a large-scale infrastructure crisis facing text, data and code with a core focus on reliability, affordability and functionality.

Taking the above together, it becomes clear that one of the major obstacles towards infrastructure reform on the decision-maker side is probably that EOSC on the one hand and DEAL, PlanS and the other initiatives on the other are seen and act as if they were addressing separate problems.

With the realization that EOSC; Plan S, DEAL, etc. are actually working on different aspects of the same issue, the problem to be solved is no longer that scholars publish in toll-access journals, but that institutions haven’t come up with a more attractive alternative. If individuals are not to blame, than there is no reason to mandate them to do anything differently. Instead, institutions should be mandated to stop funding journals via subscriptions or APCs and instead invest the money into a modern, more cost-effective infrastructure for text, data and code. Obviously, in this specificity, this is nearly impossible to mandate in most countries. However, there is a mandate that comes very close. It has been dubbed “Plan I” (for infrastructure). In brief, it entails a three step procedure:

Build on already available standards and guidelines to establish a certification process for a sustainable scholarly infrastructure
Funders require institutional certification before reviewing grant applications
Institutions use subscription funds to implement infrastructure for certification

Many or most funding agencies already have (largely unenforced) infrastructure requirements, so step one is halfway done already. Step two is just the required enforcement step and step three will come out of necessity as few public institutions will have the funds available to implement the certification quickly. If deadlines were short and funders would recommend using subscription/APC funds for the implementation, the funds could be shifted rapidly from legacy publishing to service providing.

In fact, this system is already working for some sub-disciplines, it just needs to be expanded. I was able to observe how effective it is at my own university: Before considering applications for next-generation genome sequencing machines needed by our biology and medicine departments, the DFG requires (this would be the equivalent to point 2 in the three points above) applicants to certify that they work at an institution with a so-called ‘core facility’ to handle the massive amounts of data generated by these machines. The DFG has a very detailed list of requirements for such facilities in terms of hardware and staffing (equivalent to point 1 in the three points above). There is now a high-level task force within the two departments to find/shift funds and staff (point 3 above) to create four permanent positions and implement the computational infrastructure even before a single line of an application is even written. This example shows that the three points outlined above are already happening around the world with many funding agencies and merely have to be expanded to cover all fields of scholarship. It was the overt activism that led to the sudden flow of funds and creation of positions (where there usually is a chronic shortage of both!!), that prompted the idea for Plan I. Institutions will move heaven and earth to keep research funds flowing. If funders say “jump!”, institutions ask “how high?”. In this case, institutions have both the expertise and the funds (both within their libraries) to quickly and painlessly implement these modern technologies – it should be in the self-interest of any funding agency to help them set the correct priorities.

Such funder requirements would tackle all three main infrastructure problems head on: they would promote the reliability of science by eliminating journals as the basis for journal rank which rewards unreliable science and punishes reliable science. They would approach the affordability problem by introducing open standards-based competition and substitutability to a largely monopoly-based market. In fact, the European Commission’s Directorate General for Competition has explicitly suggested such measures for initiatives such as EOSC and Plan S. Finally, it would bring many new functionalities to not only our text-based narratives, but also our audio and visual narratives as well as, most needed, provide stable and sustainable infrastructure for research data and code.

Oh, and of course, the text-based narratives, interactively combined with our data and code (e.g., via living figures), would be publicly accessible and machine readable for content mining, as an added side-benefit.

(Visited 441 times, 230 visits today)

Posted on October 14, 2019 at 14:03

alexholcombe

October 14, 2019, 21:20 | #

You would mandate that institutions (and grant applicants) have sustainable scholarly infrastructure, but you don’t seem to mention whether the grants would mandate that researchers actually use it. Are you thinking that grant holders would be required to publish the grant’s outputs in a manner that satisfies the requirements for integrated open data, code, and text? Or are you thinking that the infrastructure would be so attractive that scholars would use it without such a mandate?

I’m broadly supportive of all this, but the record of university IT projects is sobering for the notion that the system would be nice to use rather than a big headache and poorly planned for upgrades. So I’m thinking the tendering/procurement process would have to be very careful and well thought through and the best entity for creating the modern infrastructure might be some unusual new coalition of institutional repository technology providers with data cloud companies and the kind of open source authoring experts that develop things like Markdown-based solutions or Python notebooks.

Björn Brembs

October 14, 2019, 21:54 | #

Excellent comment! Yes, the post was getting too long already, so I decided to leave this aspect out.

If institutions would use subscription/APC funds to implement this infrastructure, there would soon be no other place to conveniently take care of text, data and code:
– for text, authors would just click on ‘publish’ in the authoring systems that come with their computers. Without money, how would the journals manage to stay around?
– for data, not only would the current databases (~1800 in biology alone) migrate from project funds to infrastructure funds, but all the less structured ‘long tail’ data could be made FAIR by, e.g., reading basic metadata and mirroring data folders of users. Nearly all data storage would then be automatic, with a curation point later.
– for code, the infrastructure would offer to mirror all, e.g., GitHub repositories and provide each version with a permanent ID, where the permanent ID could be cited even after M$ shut down GitHub and the code could be easily migrated to whatever comes after GitHub. Once set up, this should also be automatic.

Anybody who would not use this infrastructure would be on their own, when, e.g. reviewers asked for data/code or to finance their text publications. I’d suspect people would tire quickly if forced to interact with text/data/code not in the system?

I agree that many university solutions lack in terms of user-friendliness and intuitive use. This problem would be best addressed by a vibrant and competitive market of service providers. I hope a US$10-20 billion global market would accommodate that requirement. I wonder if there is any evidence out there as to what size such a market would need to be?

In the end, for broad usage, this must be no more difficult to use than a browser, an email client, GDocs, Slack or a dropbox. In fact, a ton of the required functionalities could be had simply by licensing these and similar solutions.