Publishing in Open Archives

June 12, 2008 – 9:43 am

Created with support from the National Library of Sweden and its development program OpenAccess.se

Tomas Lundén, 2008
http://creativecommons.org/licenses/by-nc-sa/2.5/se/

Description
This section deals with the publishing of research publications in open archives. It is a way for the researcher to make his or her publications freely accessible (Open Access). First a background to the phenomena is given, and then follow matters that in a practical manner explain which problems may be encountered and how these may be solved or avoided. These problems concern, for example, the author’s rights, policies of the publishing companies and other practical matters.

The purpose with the section is to explain the signification of publishing in open archives and to give practical advice in relation to this.

Introduction
In the following text the term parallel publishing is used for articles which have undergone peer review and which have been published in a scholarly journal (postprints) and pre-publishing for articles which have not yet been accepted or undergone a review (preprints). When I discuss other document types I will simply use the terms publishing or depositing.

One way for the researcher to make his or her publications freely accessible is to use so-called parallel publishing (the expressions self-publishing or self-archiving are sometimes used as well). This is normally referred to as “the green road” to Open Access, in relation to “the golden road”, which means to get published in Open Access journals.  Parallel publishing means that the researcher deposits a copy of a document written by him/her in another context and published digitally (usually in a scholarly journal) on a freely accessible Web page (Self-Archiving FAQ). Preferably this will be done in the open archive of the researchers’ own seat of learning (also referred to as the institutional repository). Today most Swedish universities and university colleges have an open archive of this kind.

In some subject fields researchers have been practising parallel publishing for quite some time, for example in computer science and physics. Already in 1991, Paul Ginsparg at Los Alamos National Laboratory established a digital archive named ArXiv for physics, where researchers deposit their articles so that their colleagues may take part of them before the articles get accepted and published in a scholarly journal. ArXiv is thus a subject-based open archive and there are more archives of this type, like for example RePEc (economics), CogPrints (cognition science) and CiteSeer (computer science).

One of the earliest advocators for parallel publishing was the cognition researcher Stevan Harnad, who already in 1994 published what he called “The subversive proposal” (Harnad 1995, p.13-14). Influenced by the establishing of ArXiv, Harnad propagated for researchers all over the world and in all disciplines to make their publications freely accessible via a “public ftp”, which by then was the prevailing technology.

The technical development has moved on since then but the basic idea remains. Beside the subject-oriented archives, individual researchers have also (with or without direct influence) responded to Harnad’s summons by making their publications freely accessible, on their own Web pages or on pages that belong to the research group, the department or the institution. The extent of this parallel publishing has, however, still not reached the level that Harnad propagated for.

In the last years many universities, university colleges and other research institutions within and outside of Sweden have established their own open archives. More often it is these, rather than subject-oriented archives, that are referred to when it comes to parallel publishing (Harnad 2006). A suitable definition of an open archive in this context has been given by Raym Crow: “…a digital archive of the intellectual product created by the faculty, research staff, and students of an institution and accessible to end users both within and outside of the institution.” (Crow 2002, p. 16). Further, Crow means that an archive should be interoperable and its contents should have a scholarly character. Interoperability here refers to a system that can make accessible its contents (metadata and full texts) to search engines and other services on the Internet. The standard protocol used for this purpose is the OAI-PMH (Open Archives Initiative - Protocol for Metadata Harvesting).

Different software exists for the open archives. Some of it is co-called open-source software and is free to download and to start using, like for example Eprints and DSpace. DiVA is developed at Uppsala University Library and universities and university colleges can, at a charge, get connected to the system. Several seats of learning have developed their own systems for electronic publishing. For a survey of systems used in Sweden (per 2005) see  Holmqvist & Johansson 2005.

OpenDOAR - The Directory of Open Access Repositories which is run by the University of Nottingham registers all open archives within academic institutions all over the world and keeps statistics on the growth of the archives. Out of a total of 1,020 registered archives 33 of them are from Sweden (3 January 2008).

Rights, publishing company policies and other practical matters

Scholarly articles
When you speak of Open Access through parallel publishing you first of all, as mentioned above, refer to articles which have undergone peer review and which have been published in a scholarly journal, so-called postprints. This was stated in the Budapest Open Access Initiative 2002 (see also Harnad 2001 and Harnad 2006). To make an article which has been published in a scholarly journal freely accessible the journal publishing company needs to allow this. Information about most publishing companies’ regulations can be found in the service Sherpa/Romeo, also run by the University of Nottingham.

A postprint is in Sherpa defined as an article which has been accepted for publishing and undergone peer review, and with possible changes resulting from the peer review having been revised in the article. A preprint on the other side is an article which has not yet been accepted or undergone peer review (see definitions in Sherpa/Romeo).

Sherpa/Romeo divides the publishing companies’ policies according to a colour range:

Green = permits parallel publishing of postprints and pre-publishing of preprints.

Blue = only permits parallel publishing of postprints.

Yellow = only permits pre-publishing of preprints.

White = permits neither parallel- nor pre-publishing

It is not unusual for researchers to indicate hesitation in regard to parallel publishing as a result of a fear of violating an agreement with the publishing company. The truth is that in the present situation a majority of the publishing companies, 66 per cent, permits parallel- or pre-publishing, according to Sherpa/Romeo. The numbers for postprints are 56 per cent and for preprints 43 per cent (7 January 2008).

On journal level the numbers get higher. According to Eprints.org 62 per cent of the journals permit parallel publishing of postprints and 29 per cent permit pre-publishing of preprints. Totally, then, this adds up to 91 per cent for some form of publishing (7 January 2008). The reason why the numbers for publishing companies and journals differ is because several of the publishing companies which permit parallel publishing of postprints are very large and put out a considerable amount of journals, like for example Elsevier.

It should be noted that Eprints.org uses a somewhat different colour range than Sherpa/Romeo:

Eprints full green = Sherpa green + blue (permits postprints and in some cases preprints).

Eprints pale green = Sherpa yellow (only permits preprints).

It can of course happen that the publishing company that the researcher uses for publishing is “white”, meaning thus that the company does not generally permit parallel publishing or it might not exist in the Sherpa/Romeo database at all. In this case you can simply send a letter to the publishing company and ask for permission. You can also, already before signing the publishing contract but after the article has been accepted, request to keep the right to deposit a copy of the article in the open archive of your own seat of learning. (Read more in the section called Copyright for researchers.)

The publishing company versus the author’s article version 
In addition to the distinction between postprint and preprint it is important to also differentiate between two variants of postprints, namely the publishing company’s published PDF file and the author’s final approved manuscript. The publishing company’s PDF is quite simply put the PDF which is published in the journal. The author’s last version is in the ideal case identical in terms of contents, but unformatted and does not contain the journal’s pagination or logotype. Most large publishing companies today only permit parallel publishing of the author’s version and not the publishing company’s PDF. This information will in this case be found under “Conditions” in the Sherpa/Romeo database.

A few examples of how this may be expressed:

• Publisher’s PDF cannot be used

• Publisher’s version cannot be used

• Author’s version of post-prints may be archived

Due to the circumstance that it is often only the author’s version that may be deposited it is important for the author to verify that the final version returned to him/her by the publishing company is identical to the published version in terms of contents. This is not least important when articles are deposited by representatives (librarians or administrative staff) and not by the researcher himself/herself. This may, for example, require technical skills from the staff, like inserting images and diagrams correctly in the article, something which the Medical Faculty Library at Lund University has experience from (Hultman-Özek 2005). Uncertainties regarding the status of the version may, naturally, also arise. The only way to definitely guarantee the author’s version as identical to the publishing company’s version really consists of comparing the texts (Antelman 2006, p. 87). Most people would probably consider this to be much too resource-demanding work for the library/administration. If you have a work flow which involves representatives it is reasonable to see the handling of the versions as, primarily, the author’s responsibility.

As regards potential differences between the versions, a couple of studies have been published which show that there are often differences and sometimes outright faults in the articles. Most often the faults are found in the author’s version, but there are also examples of faults being introduced in the published version which did not exist in the author’s manuscript. None of the studies pointed to any major inaccuracies, the faults concerned mainly minor things which did not affect the scientific result or the understanding of this result (Goodman, Dowson & Yaremchuk 2007; Wates & Campbell 2007). However, more and larger studies on this matter are needed in order to clear away hesitance in regard to parallel-published versions.  

As Antelman further points out in his article it is a fact that an author’s version of a postprint-due to it being unformatted-almost looks like a preprint. “Without the contextual branding of a journal or pagination, such a document is not, according to the norms of most disciplines, citable.” (Antelman 2006, p. 87). When a published article is deposited in an open archive it is therefore important to fill in bibliographic information correctly (see more below under the section How to do it and why?) and also to link to the officially published version of the article (which is something required by almost all publishing companies). Furthermore, when the full text is an author’s version it is highly recommendable to add a standardized front endpaper which in a clear way gives the reference and which states that the version has been peer-reviewed. The front endpaper is placed as the first page in the PDF file.

Here is an example of how you may formulate such a front endpaper of a scholarly article:

This is an author-produced version of a paper published in Journal of example science.

This paper has been peer-reviewed but does not include the final publisher proof-corrections or journal pagination.

Citation for the published paper:

Andersson, A., “Example of a paper”,

Journal of example science, 2007, volume 5, issue 5, pp. 5-10.

URL to article at publisher’s site: http://dx.doi.org/13234567889

Access to the published version may require journal subscription.

Published with permission from: Elsevier

Embargo
An embargo in this context refers to a restriction, for example from a publishing company, regarding how soon an article can be made public in an open archive.

It can, for example, be a matter of 6 or 12 months after publication in the journal. This information is found under “Restrictions” or “Conditions” in Sherpa/Romeo, and can, for example, be expressed in the following way:

• 12 months embargo

• Publisher’s version/PDF may be used after 12 months.

Recommendation or requirements from research financiers
To an increasingly higher degree research financiers recommend or require that a copy of publications resulting from research funded by them be deposited in an open archive. Sherpa runs a sister service to Romeo called Juliet, which lists the financiers who have adopted a policy for this procedure. Sherpa/Juliet lists three criteria for considering the financier’s policy as entirely Open Access:

That depositing is required (meaning then mandatory)

That the deposited version be the postprint version of the article (either the publishing company’s PDF or the author’s version)

That the depositing must take place directly when the article is accepted by a journal (i.e. without any embargo period)

In the present situation (7 January 2008) no financier meets all three criteria but several of the research councils in Great Britain (UK Research Councils) meet the two first ones. Still, an embargo is accepted, most of them indicate 6 months.

Other document types
Research publications which have not been published in scholarly journals can also be deposited in open archives. This may be conference contributions, book chapters, entire books or reports of various kinds. Publications which have been published externally (outside of the seat of learning) need, just as articles, to get the publisher’s permission to parallel publish. This is done most simply through contacting the publishing company/publisher and request permission. Commercial publishing companies, but often also national authorities or organisations of different types, may be concerned here. Experience shows that strikingly often such permission is granted.

Many of these other types of documents consist of what is called “grey literature.” The open archives offer a possibility for the grey literature to become visible and accessible in a completely different way compared to before (Correia & Neto 2002 ; Banks 2005). One definition of grey literature reads: “Information produced on all levels of government, academics, business and industry in electronic and print formats not controlled by commercial publishing i.e. where publishing is not the primary activity of the producing body” (GreyNet : The Grey Literature Network Service). Grey literature does normally not undergo peer review but can still be publications of scholarly character. In this context this refers for example to material published at the seat of learning or material which is not published at all. To publications published within the seat of learning the author owns the copyright and can make the publication freely accessible unless no special agreement made with the institution, faculty or seat of learning has been signed. Parts of dissertations, report series, some local journals etc. are included in this group. Without any problem the author may publish a copy of unpublished material, such as working papers, in the local open archive.

According to statistics at OpenDOAR (7 January 2008) the most common document types in the open archives, globally seen, are still theses and dissertations together with unpublished reports and working papers, and not peer-reviewed journal articles.

How to do it and why?
A common objection from researchers against depositing their publications in open archives is that they believe or experience that it takes a lot of time and that it is complicated. The matter is not only about uploading a file somewhere, but also about describing the publication through bibliographic information (metadata). But studies show, de facto, that it is neither particularly time-consuming nor difficult to deposit publications in open archives.

A study at the University of Southampton based on the software Eprints and with server logs being studied showed that the average time for deposition of an article was ca 10 minutes. The median time was even less, 5 minutes and 37 seconds (Carr & Harnad 2005, p. 5). Basing themselves on an average number of authors who per article spent 3.33 minutes, Carr and Harnad further calculated that a researcher who published one article per month would spend about 39 minutes per year depositing his or her articles (Carr & Harnad 2005, p. 6). Another study carried out by Swan and Brown at Key Perspectives Ltd. and based on questionnaires sent out to researchers all over the world shows a similar result. 52 per cent of the respondents were of the opinion that it took only a few minutes to deposit an article (Swan & Brown 2005, p. 53f; see also Swan 2006, p. 55). Furthermore, both these studies show that the time it takes to register metadata is considerably reduced after the first article. Carr and Harnad also demonstrate that the more articles an author deposits, the faster it goes.

As regards the difficulty of depositing, Swan and Brown report that after the first article 72 per cent of the researchers found depositing easy or very easy. Only 9 per cent experienced it as difficult (Swan & Brown 2005, p. 54; Swan 2006, p. 55f).

Why is then metadata needed? Because good and structured metadata increases the chance for use and citing of the article, which is what researchers desire. In the Swan and Brown study 92 per cent of the researchers meant that the reason for any type of publishing is the desire to spread research results to colleagues (”communicate results to their peers”) (Swan & Brown 2005, p. 23). A number of studies have shown that articles that are freely accessible on the Web are cited earlier and more than articles that are only accessible via subscription-based journals (Open Citation Project: “The effect of open access and downloads (’hits’) on citation impact: a bibliography of studies”).

The institutional repositories provide the possibility to make deposited publications visible and accessible on the Web by adding structured metadata and by having the OAI-PMH protocol make sure that the data can be “harvested” and made visible in different search services and search engines (see further under Increased exposure and accessibility  - the OAI-PMH protocol and search services).

For this reason it is important to add metadata to the publication. Even if the full text for some reason may not be deposited, the information will make it easier for someone who has found the reference on the Web to decide whether the article is of interest to him/her, and in that case get hold of the full text in another way.                                                                                     

References

Antelman, K. (2006). Self-archiving practice and the influence of publisher policies in the social sciences. Learned Publishing 19(2), pp. 85-95 (Electronic). Access: http://dx.doi.org/10.1087/095315106776387011 (7 January 2008). Parallel-published version: http://eprints.rclis.org/archive/00006023/

Banks, M. (2005). Towards a continuum of scholarship : the eventual collapse of the distinction between grey and non-grey literature, In D. Farace & J.Frantzen (eds.), Open access to grey resources : seventh international conference on grey literature ; INIST-CNRS, Nancy, France, 5 - 6 December 2005. Amsterdam : TextRelease. ISBN 90-77484-06-X (Electronic). Access: http://eprints.rclis.org/archive/00005803/  (7 January 2008).

Carr, L. & Harnad, S. (2005). Keystroke economy : a study of the time and effort involved in self-archiving. Technical report, ECS, University of Southampton (Electronic). Access: http://eprints.ecs.soton.ac.uk/10688/ (7 January 2008).

Correia, A.M.R. & Neto, M.D. (2002). The role of eprint archives in the access to, and dissemination of, scientific grey literature : LIZA - a case study by the National Library of Portugal. Journal of Information Science 28 (3), pp. 231-41 (Electronic). Access: http://dx.doi.org/10.1177/016555150202800305 (7 January 2008).

Crow, R. (2002). The case for institutional repositories : a SPARC position paper. Washington, DC : SPARC (The Scholarly Publishing & Academic Resources Coalition) (Electronic). Access: http://www.arl.org/sparc/bm~doc/ir_final_release_102.pdf (7 January 2008).

Goodman, D., Dowson, S. & Yaremchuk, J. (2007). Open access and accuracy : author-archived manuscripts vs. published articles. Learned Publishing 20(3), pp. 203-215 (Electronic). Access: http://dx.doi.org/10.1087/095315107X204012  (7 January 2008). Parallel-published version: http://dlist.sir.arizona.edu/1968/

Harnad, S. (1995). Overture : the subversive proposal, In A. Okerson & J. O’Donnell (eds.), Scholarly journals at the crossroads : a subversive proposal for electronic publishing. Washington, DC : Association of Research Libraries. ISBN: 0-918006-26-0 (Electronic). Access: http://www.arl.org/bm~doc/subversive.pdf  (7 January 2008).

Harnad, S. (2001). The self-archiving initiative. Nature 410, pp. 1024-25 (Electronic). Access: http://dx.doi.org/10.1038/35074210 (7 January 2008). Parallel-published version: http://eprints.ecs.soton.ac.uk/5947/

Harnad, S. (2006). Optimizing OA self-archiving mandates : what? where? when? why? how?. Technical report, ECS, University of Southampton (Electronic). Access: http://eprints.ecs.soton.ac.uk/13098/ (7 January 2008).

Holmqvist, K. & Johansson, T. (2005). Organiserad vetenskaplig elektronisk publicering vid universitet och högskolor i Sverige [Organised scholarly electronic publishing at universities and university colleges in Sweden]. Master’s thesis in Computer- and Information Science, Lund University (Electronic). Access: http://theses.lub.lu.se/archive/2005/06/21/1119362831-12377-21/Organiserad_vetenskaplig_elektronisk_publicering.pdf  (7 January 2008).

Hultman-Özek, Y. (2005). Lund Virtual Medical Journal makes self-archiving attractive and easy for authors. D-Lib Magazine 11(10) (October 2005) (Electronic). Access: http://dx.doi.org/10.1045/october2005-ozek (7 January 2008).

Swan, A. (2006). The culture of open access: researchers’ views and responses, In N. Jacobs (ed.), Open access: key strategic, technical and economic aspects, pp. 52-59. Oxford : Chandos. ISBN: 1-84334-204-9 (Electronic). Access: http://eprints.ecs.soton.ac.uk/12428/ (7 January 2008).

Swan, A. & Brown, S. (2005). Open access self-archiving : an author study. Truro, UK : Key Perspectives Ltd (Electronic). Access: http://eprints.ecs.soton.ac.uk/10999/   (7 January 2008).

Wates, E. & Campbell, R. (2007). Author’s version vs. publisher’s version: an analysis of the copy-editing function. Learned Publishing 20(2), pp. 121-129 (Electronc). Access: http://dx.doi.org/10.1087/174148507X185090  (7 January 2008).

Print This Post Print This Post

Sorry, comments for this entry are closed at this time.