Question: Calculating Time From Submission To Publication / Degree Of Burden In Submitting A Paper
21
gravatar for Ryan D
6.5 years ago by
Ryan D3.3k
USA
Ryan D3.3k wrote:

I was wondering if it would be possible to calculate some kind of a metric for the speed-of-publication for each journal. I'm not sure submitted and accepted dates are available for all papers, but I noticed in XML data there are fields like the following:

<PubMedPubDate PubStatus="received">
                <Year>2011</Year>
                <Month>12</Month>
                <Day>13</Day>
            </PubMedPubDate>
            <PubMedPubDate PubStatus="accepted">
                <Year>2012</Year>
                <Month>4</Month>
                <Day>2</Day>
            </PubMedPubDate>
            <PubMedPubDate PubStatus="aheadofprint">
                <Year>2012</Year>
                <Month>4</Month>
                <Day>2</Day>

Is there any type of tool which calculates the average time from which a paper is submitted to the time it is published? Or is there a way that this kind of information could be abstracted from this database to give a aggregate estimate of turn-around time? Has someone already done this? And--not to get too off topic--but what other kinds of measures would be useful to evaluate the degree of burden in submitting a paper?

EDIT: Pierre really took this to the next level in answering this question. The table he produced is very interesting and informative and his complete results are posted at figshare. Check it out. Or try it out.

publication pubmed • 5.2k views
ADD COMMENTlink modified 6.5 years ago by Pierre Lindenbaum118k • written 6.5 years ago by Ryan D3.3k
2

I would title this question as "Degree of burden in submitting a paper" :) !

ADD REPLYlink written 6.5 years ago by Khader Shameer18k
2

It would be interesting to calculate results per journal and compare to what the publisher claims is turnaround time :)

ADD REPLYlink written 6.5 years ago by Neilfws48k

That's a good point. There are a lot of claims about the speed of the review process made by journals but as far as I know there is no one who checks these facts. Our experience with some journals has certainly deviated a great deal from their claims.

ADD REPLYlink written 6.5 years ago by Ryan D3.3k
1

I've played with my java program and uploaded the results on figshare: http://dx.doi.org/10.6084/m9.figshare.96403

ADD REPLYlink written 6.5 years ago by Pierre Lindenbaum118k
1

Wish I had this when I was trying to calculate the embargo-induced delays in publication of the ENCODE papers http://caseybergman.wordpress.com/2012/09/05/the-cost-to-science-of-the-encode-publication-embargo/

ADD REPLYlink modified 6.4 years ago • written 6.4 years ago by Casey Bergman18k

Very useful idea!

ADD REPLYlink written 6.5 years ago by Ali140

This is an issue in the wet-lab world for sure: http://www.nature.com/news/2011/110427/full/472391a.html

I wonder if there is a similar phenomenon among bioinformatics journals. "Please provide tests of extra use cases..." that sort of thing. Anyone had that experience?

ADD REPLYlink written 6.5 years ago by Alex Paciorkowski3.3k
11
gravatar for Pierre Lindenbaum
6.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum118k wrote:

The following java program parses a pubmed XML from stdin and prints the difference of days beteen "received" and "accepted":

import java.io.InputStream;
import java.util.GregorianCalendar;
import java.util.concurrent.TimeUnit;

import javax.xml.namespace.QName;
import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.events.Attribute;
import javax.xml.stream.events.XMLEvent;




public class Biostar54473
    {
    private static class PubMedPubDate
        {
        int year;
        int month=-1;
        int day=-1;
        @Override
        public String toString() {
            String s=String.format("%04d", year);
            if(month!=-1)
                {
                s+="-"+String.format("%02d", month);
                if(day!=-1)
                    {
                    s+="-"+String.format("%02d", day);
                    }
                }
            return s;
            }
        long getTimeInMillis()
            {
            GregorianCalendar cal=new GregorianCalendar(
                    year,
                    month==-1?0:month-1,
                    month==-1 || day==-1?
                    1:day);
            return cal.getTimeInMillis();
            }
        }

    private void parse(InputStream in) throws Exception
        {
        XMLInputFactory factory = XMLInputFactory.newInstance();
        factory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, Boolean.FALSE);
        factory.setProperty(XMLInputFactory.IS_VALIDATING, Boolean.FALSE);
        factory.setProperty(XMLInputFactory.IS_COALESCING, Boolean.TRUE);
        factory.setProperty(XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, Boolean.FALSE);
        XMLEventReader r= factory.createXMLEventReader(in);
        String PubStatus=null;
        PubMedPubDate curr=null;
        PubMedPubDate accepted=null;
        PubMedPubDate received=null;
        String MedlineTA=null;
        String pmid=null;
        String ArticleTitle=null;
        QName attPubStatus=new QName("PubStatus");
        while(r.hasNext())
            {
            XMLEvent evt=r.nextEvent();
            if(evt.isStartElement())
                {
                String name=evt.asStartElement().getName().getLocalPart();
                if(name.equals("PubmedArticle"))
                    {
                    pmid=null;
                    accepted=null;
                    received=null;
                    MedlineTA=null;
                    pmid=null;
                    ArticleTitle=null;
                    }
                else if(name.equals("ArticleTitle") && ArticleTitle==null)
                    {
                    ArticleTitle=r.getElementText().trim();
                    }
                else if(name.equals("PMID") && pmid==null)
                    {
                    pmid=r.getElementText().trim();
                    }
                else if(name.equals("MedlineTA") && MedlineTA==null)
                    {
                    MedlineTA=r.getElementText().trim();
                    }
                else if(name.equals("PubMedPubDate"))
                    {
                    curr=null;
                    Attribute att=evt.asStartElement().getAttributeByName(attPubStatus);
                    if(att!=null) PubStatus=att.getValue();

                    if("received".equals(PubStatus))
                        {
                        curr=new PubMedPubDate();
                        received=curr;
                        }
                    else if("accepted".equals(PubStatus))
                        {
                        curr=new PubMedPubDate();
                        accepted=curr;
                        }
                    else
                        {
                        curr=null;
                        }
                    }


else if(curr!=null && name.equals("Year"))
                    {
                    try { curr.year=Integer.parseInt(r.getElementText().trim()); } catch(Exception err) { curr=null;received=null;ok=false;}
                    }
                else if(curr!=null && name.equals("Month"))
                    {
                    String month=r.getElementText().trim().toLowerCase();
                    if(month.equals("jan") || month.equals("january")) month="1";
                    else if(month.equals("feb") || month.equals("february")) month="2";
                    else if(month.equals("mar") || month.equals("march")) month="3";
                    else if(month.equals("apr") || month.equals("april")) month="4";
                    else if(month.equals("may") || month.equals("may")) month="5";                    
                    else if(month.equals("jun") || month.equals("june")) month="6";
                    else if(month.equals("jul") || month.equals("july")) month="7";
                    else if(month.equals("aug") || month.equals("august")) month="8";
                    else if(month.equals("sep") || month.equals("september")) month="9";
                    else if(month.equals("oct") || month.equals("october")) month="10";
                    else if(month.equals("nov") || month.equals("november")) month="11";
                    else if(month.equals("dec") || month.equals("december")) month="12";
                    try { curr.month=Integer.parseInt(month); } catch(Exception err) { curr=null;accepted=null;ok=false;}
                    }
                else if(curr!=null && name.equals("Day"))
                    {
                    try { curr.day=Integer.parseInt(r.getElementText().trim()); } catch(Exception err) { curr=null;accepted=null;ok=false;}
                    }

                }
            else if(evt.isEndElement())
                {
                String name=evt.asEndElement().getName().getLocalPart();
                if(name.equals("PubmedArticle"))
                    {
                    if(received!=null && accepted!=null)
                        {
                        long n=accepted.getTimeInMillis()-received.getTimeInMillis();
                        System.out.println(
                                pmid+"\t"+
                                ArticleTitle+"\t"+
                                MedlineTA+"\t"+
                                received+"\t"+
                                accepted+"\t"+
                                TimeUnit.DAYS.convert(n, TimeUnit.MILLISECONDS)
                                );
                        }
                    ArticleTitle=null;
                    MedlineTA=null;
                    pmid=null;
                    curr=null;
                    received=null;
                    accepted=null;
                    }
                else if(name.equals("PubMedPubDate"))
                    {
                    curr=null;
                    }
                }
            }
        }
    public static void main(String[] args) throws Exception
        {
        System.out.println("#pmid\t"+
                "ArticleTitle\t"+
                "MedlineTA\t"+
                "Received\t"+
                "Accepted\t"+
                "DiffDays"
                );
        new Biostar54473().parseSystem.in);
        }

}

A 'verticalized' example for a few papers containing the word "Next generation Sequencing" in the title. You can read this in R# or whatever to get some stats about a journal, a subject, etc...

$ javac Biostar54473.java && cat pubmed_result.xml | java Biostar54473

>>>    2
$1    #pmid           23020966
$2    ArticleTitle    Transcriptome analysis using next-generation sequencing.
$3    MedlineTA       Curr Opin Biotechnol
$4    Received        2012-07-04
$5    Accepted        2012-09-04
$6    DiffDays        62
<<<    2

>>>    3
$1    #pmid           23000871
$2    ArticleTitle    Understanding pathogens in the era of next generation sequencing.
$3    MedlineTA       J Infect Dev Ctries
$4    Received        2012-09-13
$5    Accepted        2012-09-14
$6    DiffDays        1
<<<    3

>>>    4
$1    #pmid           22994565
$2    ArticleTitle    Accurate variant detection across non-amplified and whole genome amplified DNA using targeted next generation sequencing.
$3    MedlineTA       BMC Genomics
$4    Received        2012-01-30
$5    Accepted        2012-09-20
$6    DiffDays        233
<<<    4
(...)
>>>    253
$1    #pmid           18604217
$2    ArticleTitle    Alta-Cyclic: a self-optimizing base caller for next-generation sequencing.
$3    MedlineTA       Nat Methods
$4    Received        2008-03-10
$5    Accepted        2008-06-02
$6    DiffDays        83
<<<    253

>>>    254
$1    #pmid           18262675
$2    ArticleTitle    The impact of next-generation sequencing technology on genetics.
$3    MedlineTA       Trends Genet
$4    Received        2007-11-15
$5    Accepted        2007-12-17
$6    DiffDays        32
<<<    254
ADD COMMENTlink modified 6.5 years ago • written 6.5 years ago by Pierre Lindenbaum118k
1

The year/month/day are not always some valid integers. I've updated my code to catch the errors.

ADD REPLYlink written 6.5 years ago by Pierre Lindenbaum118k

Fantastic. Thanks for such an awesome answer, Pierre.

ADD REPLYlink written 6.5 years ago by Ryan D3.3k

This looks like it should work. I'm unfamiliar with java so much. I got an error: javac Biostar54473.java && cat pubmed_result.xml | java Biostar54473

pmid ArticleTitle MedlineTA Received Accepted DiffDays

Exception in thread "main" javax.xml.stream.XMLStreamException: ParseError at [row,col]:[132,2] Message: The markup in the document following the root element must be well-formed. at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:591) at com.sun.xml.internal.stream.XMLEventReaderImpl.nextEvent(XMLEventReaderImpl.java:83) at Biostar54473.parse(Biostar54473.java:63) at Biostar54473.main(Biostar54473.java:162)

Any ideas?

ADD REPLYlink written 6.5 years ago by Ryan D3.3k
1

please run "xmllint pubmed_result.xml" to check your xml file.

ADD REPLYlink written 6.5 years ago by Pierre Lindenbaum118k

Perfect. That showed my XML file was malformed. The new file worked perfectly. One way I can think to improve this would be to use an alternate date if one of those is not available. For instance, of 2608 Pubmed articles on "Next Generation Sequencing", I only get output for . This is because only 1114 have an entry for <PubMedPubDate PubStatus="received"> and <PubMedPubDate PubStatus="accepted">. This is still really great. And doing as Pierre said and loading the results into R can give a great idea of the average "degree of burden" in submitting a paper as Khadeer called it. :-) Masterful. Thanks again, Pierre.

ADD REPLYlink written 6.5 years ago by Ryan D3.3k
2

will you prepare a manuscript indicating your results? keep us up to date!

ADD REPLYlink written 6.5 years ago by Flow1.5k
7

Hopefully the reviewers do not request that you apply your method to the current paper, and thus enter an infinite recursion loop.

ADD REPLYlink written 6.5 years ago by Matt Shirley8.9k

now seriously, I am sure this has been previously studied and reported in some of those bibliometrics journals. Who will be the first to find some of this papers? :)

ADD REPLYlink written 6.5 years ago by Flow1.5k

That's hilarious. Really I had just wondered for my own sake of curiosity. I think our rather large group would like to know.

ADD REPLYlink written 6.5 years ago by Ryan D3.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1341 users visited in the last hour