Compliance Monitoring (By Richard Jones)

In this final part of the Jisc Monitor project, Cottage Labs will be looking to tie up a number of threads that have been working on throughout.

Open Access compliance checking is currently a task carried out by humans, and there is no one single place to look for the relevant information. This means that it is time consuming, and a prime candidate for total or partial automation. Being able to quickly and easily check compliance of an article or a set of articles will be of benefit to both institutions and funders. Aspects of an article which may be considered (depending on policy) include:

  • What is the licence a work has been published under?
  • What embargo is the work subject to?
  • Does the article include the funder acknowledgement?
  • Is the article archived in a suitable repository?
  • Does the article include an acknowledgement of the research materials?
  • Is the article free-to-read (i.e. accessible without login/payment, irrespective of the re-use licence)

We want to see what our options are for automatically detecting this information, to model the data for interchange, and consider how it plays alongisde the APC aggregation.

In earlier stages we looked at the following pieces which will be drawn together for these next steps:

  1. Creation of a client environment for querying the Open Article Gauge with lists of identifiers for which we wish to determine the actual published licence conditions – this gives us the licence and free-to-read status on articles where possible. Software.
  2. Definition of a data model for exchanging information about APCs
  3. An aggregation of APC data, upon which reports and analysis can be built (with some demonstration graphical tools). Software and temporary live demonstrator (which will change throughout the project)

In parallel to the Jisc Monitor project, Cottage Labs has also been working with the Wellcome Trust on issues around compliance, which involved integration with OAG, Europe PMC and DOAJ. We plan therefore to take advantage of this existing work, and extend the software to support also the following information interesting for compliance:

  1. Is the content available in the Open University’s CORE service – therefore, along with EPMC, the work is archived in a suitable repository
  2. What are the journal’s embargo policies, according to Sherpa FACT/RoMEO – therefore a best-guess at what embargo has been applied to the article

This leaves us with the more tricky compliance aspects which relate to the actual textual content of an article: the funder acknowledgement and the acknowledgement of research materials.

For the latter, there is little prior-art, and little standardisation, so for the time being we will leave it alone.

For the former, though, there are some “standard” texts that authors should use, plus we can potentially look for the funder names, or the grant codes associated with an article. We are therefore conducting a short investigation into how easily we can set up searches in HTML and PDF full-texts to detect statements that look like funder acknowledgements, and are looking at the ContentMine project as a likely starting point to carry out the advanced text analysis required.

On the flip-side to detecting this information, we also wish to be able to model the compliance and serialise it for interchange. To that end, we will be taking the data model we worked on for the APC aggregation, and extending it to cover these compliance aspects. This has the added advantage that the APC aggregation could actually become the APC+Compliance Aggregation, and we would be able to construct reports over both aspects of the data.

One thought on “Compliance Monitoring (By Richard Jones)

  1. Petr Knoth

    CORE provides access to the full-text whenever the full-text is available by the repository. The PDF file is pre-processed and converted into plain text that can be requested through the CORE API. Searching for the licence information in the full-text -text is then just a matter of a few regular expressions. The OpenAIRE project has shown this approach is quite effective and many full-texts do contain this information. Unfortunately, these might be in many cases the publisher accepted versions and not the pre-/post-prints.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *