Re: Free Access vs. Open Access

Re: Free Access vs. Open Access SERIALST Moderator 12 Aug 2003 13:50 UTC
Date: Tue, 12 Aug 2003 09:14:45 +0100
From: Jan Velterop <jan@biomedcentral.com>
Subject: RE: Free Access vs. Open Access

Posting on behalf of Matt Cockerill:

Stevan asks:

         "The use one makes of those full texts is to read them,
          print them off, quote/comment them, cite them, and use
          their *contents* in further research, building on them.
          What is "re-use"? And what is "redistribution" (when
          everyone on the planet with access to the web has access
          to the full-text of every such article)?"

Having free access to articles on the publisher's website would certainly
offer progress compared to the current status quo. But it would not offer
anything like the benefits of true open access. Here are just some of the
reasons why re-use and re-distribution rights are vital to open access:

(1) Digital permanence - it is not enough for the publisher to be the only
body which curates the full archive of published research content. To ensure
long term digital permanence of the scientific record, it is vital that
articles should be deposited with multiple archives, and redistributable
from and between those archives.

(2) A flexible choice of tools for searching and browsing
The reason that Google exists is because the web is free for anyone to
download and index. As a result, there is competition among search engines,
and Google had the incentive to develop a better system for indexing web
pages, which has since driven other search engine companies to improve the
tools they offer.

Compare this with the situation with scientific research. If the research
resides only on the publisher's site, you don't have a free choice of what
tools you use to search and browse it - you are stuck with what that
particular publisher provides you with.

This ties in with developments in Grid computing (e.g.
http://www.escience-grid.org.uk/ ). With open access, published research
would be available "on tap" via the grid, and scientists would be able to
use their preferred choice of grid tools to access the data, rather than
being stuck with the tools provided by the publisher.

(3) Datamining

With a million or so biomedical research articles being published each year,
the sheer volume of output is an obstacle to the comprehension and synthesis
of the results reported in that research. If the XML of the articles can be
brought together in one place then the tools of datamining can be applied to
it to extract useful but non-obvious information.

The simplest type of datamining is citation analyis

Currently you need to pay ISI a lot of money to find out what cites what,
but with true open access, citation analysis becomes trivial.

So, for example, if you view a PubMed record:
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_ui
ds=11667947&dopt=Abstract
you already get links to all the full text articles in PubMed Central which
cite that PubMed item
http://www.pubmedcentral.gov/tocrender.fcgi?action=cited&tool=pubmed&pubmedi
d=11667947

The more true open access research that is published and archived at PubMed
Central, the more useful this becomes for biomedical researchers. [Sure,
"screen-scaping" HTML from free articles displayed on publisher sites could
give some citation information, but with nothing like the ease, accuracy and
reliability that it can be obtained with the use of XML data, as at PubMed
Central].

Beyond citation analysis, there are many other forms of datamining that are
possible:
For more information see:
http://www.biomedcentral.com/info/about/datamining/

e.g. Research articles can be mined for details of protein interactions
http://bioinfo.mshri.on.ca/prebind/

And as scientific content is increasingly marked up using richer forms of
semantically meaningful XML (e.g. CML for chemical structures, MathML for
equations), the value of datamining will continue to increase.

The BioLINK group are using BioMed Central's open access corpus as the raw
material for a datamining competition, designed to stimulate progress in the
development of tools for biological datamining.
http://www.pdg.cnb.uam.es/BioLINK/BioCreative_task2.html

(4) Derivative works and compilations
Say that a scientist performs a meta-analysis on a group of published
clinical trials, and wants to make available the conclusions of that
research. Or perhaps a datamining researcher has taken a corpus of 1000
articles breast cancer, and established some interesting conclusions.

In a true open access environment, each is free to post the results of their
research, *along with* the actual corpus of data which the research was
based on (effectively, the raw data for that research).
But in a non-open access environment, that raw data (i.e. the research
articles) cannot be redistributed, which makes it far more difficult than it
needs to be for other scientists to reproduce, critique and follow up the
work.

Similarly, a scientist may wish to make a point by assembling a collection
of certain articles or article fragments (perhaps they wish to assemble a
comparison of the methods used for a certain technique).
In an open access world, as long as they cite the sources, they are
completely free to create and redistribute that compilation. Such a
selective compilation may in itself be extremely useful contribution to
science.

(5) Print redistribution rights - the National Health Service, for example,
should be able to redistribute thousands of printed copies of an important
research article (which it may have funded) to its doctors if it wishes to
do so. It should not have to pay a hefty copyright fee for the privilege.
Certainly, print redistribution will likely become less significant in the
future, but there is no logical reason that the scientific community should
not be free to exchange and distribute the research that it has created in
print form, as well as online.

Matt Cockerill

==
Matthew Cockerill Ph.D.
Technical Director
BioMed Central Limited (http://www.biomedcentral.com)
34-42, Cleveland Street
London W1T 4LB

Email: matt@biomedcentral.com