How much should a scholarly article cost the taxpayer?

Jan07

How much should a scholarly article cost the taxpayer?

In: science politics • Tags: infrastructure, publishing

tl;dr: It is a waste to spend more than the equivalent of US$100 in tax funds on a scholarly article.

Collectively, the world’s public purse currently spends the equivalent of US$~10b every year on scholarly journal publishing. Dividing that by the roughly two million articles published annually, you arrive at an average cost per scholarly journal article of about US$5,000.

Inasmuch as these legacy articles are behind paywalls, the average tax payer does not get to see what they pay for. Even worse for academics: besides not being able to access all the relevant literature either, cash-strapped public institutions are sorely missing the subscription funds, which could have modernized their digital infrastructure. Consequently, researchers at most public institutions are stuck with technology that is essentially from the 1990s, specifically with regard to infrastructure taking care of their three main forms of output: text, data and code.

Another pernicious consequence of this state of affairs: institutions have been stuck with a pre-digital strategy for hiring and promoting their faculty, namely judging them by the venues of their articles. As the most prestigious journals publish, on average, the least reliable science, but the scientists who publish there are awarded with the best positions (and are, in turn, training their students how to publish their unreliable work in these journals), science is now facing a replication crisis of epic proportions: most published research may possibly be false.

Thus, both the scientific community and the public have more than one reason to try and free some of the funds currently wasted on legacy publishing. Consequently, there are a few new players on the publishing market who offer their services for considerably less. Not surprisingly, in developing countries, where cash is even more of an issue, already more than 15 years ago a publicly financed solution was developed (SciELO) that publishes fully accessible articles at a cost of between US$70-200, depending on various technical details. In the following 15 years, problems have accumulated now also in the richer countries, prompting the emergence of new publishers. Also for these, the ballpark price range from just under US$100 to under US$500 per article is quoted by some of these newer publishers/service providers such as Scholastica, Ubiquity, RIO Journal, Science Open, F1000Research, PeerJ or Hindawi. Representatives of all of these publishers independently tell me that their costs per article range in the low hundreds and Ubiquity, Hindawi and PeerJ are even on record with this price range. [After this post was published, Martin Eve of the Open Library of the Humanities also quoted roughly these costs for their enterprise. I have also been pointed to another article who sets about US$300 per article as an upper bound, also along the lines of all the other sources.]

Tweet link.

Now, as a welcome confirmation, yet another company, Standard Analytics, comes to similar costs in their recent analysis.

Specifically, they computed the ‘marginal’ costs of an article, which they define as only taking “into account the cost of producing one additional scholarly article, therefore excluding fixed costs related to normal business operations“. I understand this to mean that if an existing publisher wanted to start a new scholarly journal, these would be the additional costs they would have to recoup. The authors mention five main tasks to be covered by these costs:

1) submission

2) management of editorial workflow and peer review

3) typesetting

4) DOI registration

5) long-term preservation.

They calculate two versions of how these costs may accrue. One method is to outsource these services to existing vendors. They calculate prices using different vendors that range between US$69-318, hitting exactly the ballpark all the other publishers have been quoting for some time now. Given that public institutions are bound to choose the lowest bidder, anything above the equivalent of around US$100 would probably be illegal. Let alone 5k.

However, as public institutions are not (yet?) in a position to competitively advertise their publishing needs, let’s consider the side of the publisher: if you are a publisher with other journals and are shopping around for services to provide you with an open access journal, all you need to factor in is some marginal additional part-time editorial labor for your new journal and a few hundred dollars per article. Given that existing publishers charge, on average, around €2,000 per open access article, it is safe to say that, as in subscription publishing, scientists and the public are being had by publishers, as usual, even in the case of so-called ‘gold’ open access publishing. These numbers also show, as argued before, that just ‘flipping’ our journals all to open access is at best a short-term stop-gap measure. At worst, it would deteriorate the current situation even more.

Be that as it may, I find Standard Analytics’ second calculation to be even more interesting. This calculation actually conveys an insight that was entirely new, at least for me: if public institutions decided to run the 5 steps above in-house, i.e., as part of a modern scholarly infrastructure, per article marginal costs would actually drop to below US$2. In other words, the number of articles completely ceases to be a monetary issue at all. In his critique of the Standard Analytics piece, Cameron Neylon indicated, with his usual competence and astuteness, that of course some of the main costs of scholarly communication aren’t really the marginal costs that can be captured on a per-article basis. What requires investment are, first and foremost, standards according to which scholarly content (text/audio/video: narrative, data and code) is archived and made available. The money we are currently wasting on subscriptions ought to be invested in an infrastructure where each institution has the choice of outsourcing vs. hiring expertise themselves. If the experience of the past 20 years of networked digitization is anything to go by, then we need to invest these US$10b/a in an infrastructure that keeps scholarly content under scholarly control and allows institutions the same decisions as they have in other parts of their infrastructure: hire plumbers, or get a company to show up. Hire hosting space at a provider, or put servers into computing centers. Or any combination thereof.

What we are stuck with today is nothing but an obscenely expensive anachronism that we need to dispense of.

By now, it has become quite obvious that we have nothing to lose, neither in terms of scholarly nor of monetary value, but everything to gain from taking these wasted subscription funds and investing them to bring public institutions into the 21st century. On the contrary, every year we keep procrastinating, another US$10b go down the drain and are lost to academia forever. On the grand scheme of things, US$10b may seem like pocket change. For the public institutions spending them each year, they would constitute a windfall: given that the 2m articles we currently publish would not even cost US$4m, we would have in excess of US$9.996b to spend each year on an infrastructure serving only a few million users. As an added benefit, each institution would be getting back in charge of their own budget decisions – rather than having to negotiate with monopolistic publishers. Given the price of labor, hard- and software, this would easily buy us all the bells and whistles of modern digital technology, with plenty to spare.

(Visited 365 times, 157 visits today)

Posted on January 7, 2016 at 13:42

Pedro Beltrao

January 7, 2016, 19:33 | #

I agree that we are wasting money but again I disagree with the way you argue your point. In my opinion you tend to oversimplify the problems and in this way you lose audience. Academic publishing is not just the act of making things available to readers. If it was, then the cost would even be 0. My blog is free and I can post there as many documents as I ever want. I think you preach to the coir when you overstate and simlify problems like that. The same as when you say that most research is false. You don’t really believe this I suppose ? The linked article is more a conceptual discussion on multiple hypothesis testing than proof of any such thing. Is most of your research false ?

Passing over these things, I agree that there is a large waste of money. Beyond stric dissemination costs there would be also costs associated with some form of creditation and filtering. It would be great to see some of the wasted money going towards innovations in these areas. I foresee that when we get to that point scientists will be complaining about whatever system we find to replace the current publishing mechanism. The root of most of our frustrations is the intense competition for the current level of resources. We will still be competing for attention of our peers and limited funding grants and jobs.

Björn Brembs

January 8, 2016, 06:52 | #

Thanks for commenting!
I’m not sure what specifically you mean by ‘oversimplify’. In the first part, where I talk about the different organizations and their prices, I link to an earlier post where all the different steps that usually go into making a paper public are treated individually.
https://bjoern.brembs.net/2015/06/what-goes-into-making-a-scientific-manuscript-public/
Does your blog do the points 1-5? If I’m oversimplifying by stating that the costs are between 100-500 for RIO Journal, SciELO, F1000Research, Hindawi, Science Open and Standard Analytics, are these six organizations also oversimplifying? I’m only reporting what these people tell me.

The article I linked to about research being false is one of several by Ioannidis that uses data from many, many studies to estimate that ~60% of published findings are false. Actual replication studies from the last 5 years were *all* below that 40% replication mark. As the data stand right now, if anything, Ioannidis’ estimate from ten years ago of 40% replicability was too optimistic (but not by much). As to my own data – I don’t know. I always replicate my own data, but the key test is for others than myself or people in my lab to test it. My machines are very special and only few have them, so it will likely take some time until my work, if ever, is replicated.

Everything you mention in the second paragraph are precisely the functionalities described in the posts I linked to in the third to last paragraph starting with “be that as it may”, so you are dead on there, yes exactly:
https://bjoern.brembs.net/2015/04/what-should-a-modern-scientific-infrastructure-look-like/
https://cameronneylon.net/blog/what-exactly-is-infrastructure-seeing-the-leopards-spots/
https://cameronneylon.net/blog/principles-for-open-scholarly-infrastructures/