Traffic: 426 ip/hr
Question: Calculating time from submission to publication / Degree of burden in submitting a paper
 
19
 
 

I was wondering if it would be possible to calculate some kind of a metric for the speed-of-publication for each journal. I'm not sure submitted and accepted dates are available for all papers, but I noticed in XML data there are fields like the following:

<PubMedPubDate PubStatus="received">
                <Year>2011</Year>
                <Month>12</Month>
                <Day>13</Day>
            </PubMedPubDate>
            <PubMedPubDate PubStatus="accepted">
                <Year>2012</Year>
                <Month>4</Month>
                <Day>2</Day>
            </PubMedPubDate>
            <PubMedPubDate PubStatus="aheadofprint">
                <Year>2012</Year>
                <Month>4</Month>
                <Day>2</Day>

Is there any type of tool which calculates the average time from which a paper is submitted to the time it is published? Or is there a way that this kind of information could be abstracted from this database to give a aggregate estimate of turn-around time? Has someone already done this? And--not to get too off topic--but what other kinds of measures would be useful to evaluate the degree of burden in submitting a paper?

EDIT: Pierre really took this to the next level in answering this question. The table he produced is very interesting and informative and his complete results are posted at figshare. Check it out. Or try it out.

log in to commentrevisions • 2 bookmarks • permalink similar posts • request help via email
 

Very useful idea!

log in to reply • written 8 months ago by asharifiz  403
 
1

I would title this question as "Degree of burden in submitting a paper" :) !

log in to reply • written 8 months ago by Khader Shameer  13,47011141
 
2

It would be interesting to calculate results per journal and compare to what the publisher claims is turnaround time :)

log in to reply • written 8 months ago by Neilfws ♦♦ 36,18012051
 

That's a good point. There are a lot of claims about the speed of the review process made by journals but as far as I know there is no one who checks these facts. Our experience with some journals has certainly deviated a great deal from their claims.

log in to reply • written 8 months ago by Ryan D  2,5401216
 

This is an issue in the wet-lab world for sure: http://www.nature.com/news/2011/110427/full/472391a.html

I wonder if there is a similar phenomenon among bioinformatics journals. "Please provide tests of extra use cases..." that sort of thing. Anyone had that experience?

log in to reply • written 8 months ago by Alex Paciorkowski  2,180311
 
1

I've played with my java program and uploaded the results on figshare: http://dx.doi.org/10.6084/m9.figshare.96403

log in to reply • written 8 months ago by Pierre Lindenbaum ♦♦ 48,36063483
 
1

Wish I had this when I was trying to calculate the embargo-induced delays in publication of the ENCODE papers http://caseybergman.wordpress.com/2012/09/05/the-cost-to-science-of-the-encode-publication-embargo/

log in to reply • written 8 months ago by Casey Bergman  14,10021337

1 answer

 
11
 
 
 

The following java program parses a pubmed XML from stdin and prints the difference of days beteen "received" and "accepted":

import java.io.InputStream;
import java.util.GregorianCalendar;
import java.util.concurrent.TimeUnit;

import javax.xml.namespace.QName;
import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.events.Attribute;
import javax.xml.stream.events.XMLEvent;




public class Biostar54473
    {
    private static class PubMedPubDate
        {
        int year;
        int month=-1;
        int day=-1;
        @Override
        public String toString() {
            String s=String.format("%04d", year);
            if(month!=-1)
                {
                s+="-"+String.format("%02d", month);
                if(day!=-1)
                    {
                    s+="-"+String.format("%02d", day);
                    }
                }
            return s;
            }
        long getTimeInMillis()
            {
            GregorianCalendar cal=new GregorianCalendar(
                    year,
                    month==-1?0:month-1,
                    month==-1 || day==-1?
                    1:day);
            return cal.getTimeInMillis();
            }
        }

    private void parse(InputStream in) throws Exception
        {
        XMLInputFactory factory = XMLInputFactory.newInstance();
        factory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, Boolean.FALSE);
        factory.setProperty(XMLInputFactory.IS_VALIDATING, Boolean.FALSE);
        factory.setProperty(XMLInputFactory.IS_COALESCING, Boolean.TRUE);
        factory.setProperty(XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, Boolean.FALSE);
        XMLEventReader r= factory.createXMLEventReader(in);
        String PubStatus=null;
        PubMedPubDate curr=null;
        PubMedPubDate accepted=null;
        PubMedPubDate received=null;
        String MedlineTA=null;
        String pmid=null;
        String ArticleTitle=null;
        QName attPubStatus=new QName("PubStatus");
        while(r.hasNext())
            {
            XMLEvent evt=r.nextEvent();
            if(evt.isStartElement())
                {
                String name=evt.asStartElement().getName().getLocalPart();
                if(name.equals("PubmedArticle"))
                    {
                    pmid=null;
                    accepted=null;
                    received=null;
                    MedlineTA=null;
                    pmid=null;
                    ArticleTitle=null;
                    }
                else if(name.equals("ArticleTitle") && ArticleTitle==null)
                    {
                    ArticleTitle=r.getElementText().trim();
                    }
                else if(name.equals("PMID") && pmid==null)
                    {
                    pmid=r.getElementText().trim();
                    }
                else if(name.equals("MedlineTA") && MedlineTA==null)
                    {
                    MedlineTA=r.getElementText().trim();
                    }
                else if(name.equals("PubMedPubDate"))
                    {
                    curr=null;
                    Attribute att=evt.asStartElement().getAttributeByName(attPubStatus);
                    if(att!=null) PubStatus=att.getValue();

                    if("received".equals(PubStatus))
                        {
                        curr=new PubMedPubDate();
                        received=curr;
                        }
                    else if("accepted".equals(PubStatus))
                        {
                        curr=new PubMedPubDate();
                        accepted=curr;
                        }
                    else
                        {
                        curr=null;
                        }
                    }


else if(curr!=null && name.equals("Year"))
                    {
                    try { curr.year=Integer.parseInt(r.getElementText().trim()); } catch(Exception err) { curr=null;received=null;ok=false;}
                    }
                else if(curr!=null && name.equals("Month"))
                    {
                    String month=r.getElementText().trim().toLowerCase();
                    if(month.equals("jan") || month.equals("january")) month="1";
                    else if(month.equals("feb") || month.equals("february")) month="2";
                    else if(month.equals("mar") || month.equals("march")) month="3";
                    else if(month.equals("apr") || month.equals("april")) month="4";
                    else if(month.equals("may") || month.equals("may")) month="5";                    
                    else if(month.equals("jun") || month.equals("june")) month="6";
                    else if(month.equals("jul") || month.equals("july")) month="7";
                    else if(month.equals("aug") || month.equals("august")) month="8";
                    else if(month.equals("sep") || month.equals("september")) month="9";
                    else if(month.equals("oct") || month.equals("october")) month="10";
                    else if(month.equals("nov") || month.equals("november")) month="11";
                    else if(month.equals("dec") || month.equals("december")) month="12";
                    try { curr.month=Integer.parseInt(month); } catch(Exception err) { curr=null;accepted=null;ok=false;}
                    }
                else if(curr!=null && name.equals("Day"))
                    {
                    try { curr.day=Integer.parseInt(r.getElementText().trim()); } catch(Exception err) { curr=null;accepted=null;ok=false;}
                    }

                }
            else if(evt.isEndElement())
                {
                String name=evt.asEndElement().getName().getLocalPart();
                if(name.equals("PubmedArticle"))
                    {
                    if(received!=null && accepted!=null)
                        {
                        long n=accepted.getTimeInMillis()-received.getTimeInMillis();
                        System.out.println(
                                pmid+"\t"+
                                ArticleTitle+"\t"+
                                MedlineTA+"\t"+
                                received+"\t"+
                                accepted+"\t"+
                                TimeUnit.DAYS.convert(n, TimeUnit.MILLISECONDS)
                                );
                        }
                    ArticleTitle=null;
                    MedlineTA=null;
                    pmid=null;
                    curr=null;
                    received=null;
                    accepted=null;
                    }
                else if(name.equals("PubMedPubDate"))
                    {
                    curr=null;
                    }
                }
            }
        }
    public static void main(String[] args) throws Exception
        {
        System.out.println("#pmid\t"+
                "ArticleTitle\t"+
                "MedlineTA\t"+
                "Received\t"+
                "Accepted\t"+
                "DiffDays"
                );
        new Biostar54473().parse(System.in);
        }

}

A 'verticalized' example for a few papers containing the word "Next generation Sequencing" in the title. You can read this in R# or whatever to get some stats about a journal, a subject, etc...

$ javac Biostar54473.java && cat pubmed_result.xml | java Biostar54473

>>>    2
$1    #pmid           23020966
$2    ArticleTitle    Transcriptome analysis using next-generation sequencing.
$3    MedlineTA       Curr Opin Biotechnol
$4    Received        2012-07-04
$5    Accepted        2012-09-04
$6    DiffDays        62
<<<    2

>>>    3
$1    #pmid           23000871
$2    ArticleTitle    Understanding pathogens in the era of next generation sequencing.
$3    MedlineTA       J Infect Dev Ctries
$4    Received        2012-09-13
$5    Accepted        2012-09-14
$6    DiffDays        1
<<<    3

>>>    4
$1    #pmid           22994565
$2    ArticleTitle    Accurate variant detection across non-amplified and whole genome amplified DNA using targeted next generation sequencing.
$3    MedlineTA       BMC Genomics
$4    Received        2012-01-30
$5    Accepted        2012-09-20
$6    DiffDays        233
<<<    4
(...)
>>>    253
$1    #pmid           18604217
$2    ArticleTitle    Alta-Cyclic: a self-optimizing base caller for next-generation sequencing.
$3    MedlineTA       Nat Methods
$4    Received        2008-03-10
$5    Accepted        2008-06-02
$6    DiffDays        83
<<<    253

>>>    254
$1    #pmid           18262675
$2    ArticleTitle    The impact of next-generation sequencing technology on genetics.
$3    MedlineTA       Trends Genet
$4    Received        2007-11-15
$5    Accepted        2007-12-17
$6    DiffDays        32
<<<    254
 

This looks like it should work. I'm unfamiliar with java so much. I got an error: javac Biostar54473.java && cat pubmed_result.xml | java Biostar54473

pmid ArticleTitle MedlineTA Received Accepted DiffDays

Exception in thread "main" javax.xml.stream.XMLStreamException: ParseError at [row,col]:[132,2] Message: The markup in the document following the root element must be well-formed. at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:591) at com.sun.xml.internal.stream.XMLEventReaderImpl.nextEvent(XMLEventReaderImpl.java:83) at Biostar54473.parse(Biostar54473.java:63) at Biostar54473.main(Biostar54473.java:162)

Any ideas?

log in to reply • written 8 months ago by Ryan D  2,5401216
 
1

please run "xmllint pubmed_result.xml" to check your xml file.

log in to reply • written 8 months ago by Pierre Lindenbaum ♦♦ 48,36063483
 

Perfect. That showed my XML file was malformed. The new file worked perfectly. One way I can think to improve this would be to use an alternate date if one of those is not available. For instance, of 2608 Pubmed articles on "Next Generation Sequencing", I only get output for . This is because only 1114 have an entry for <PubMedPubDate PubStatus="received"> and <PubMedPubDate PubStatus="accepted">. This is still really great. And doing as Pierre said and loading the results into R can give a great idea of the average "degree of burden" in submitting a paper as Khadeer called it. :-) Masterful. Thanks again, Pierre.

log in to reply • written 8 months ago by Ryan D  2,5401216
 
2

will you prepare a manuscript indicating your results? keep us up to date!

log in to reply • written 8 months ago by Flow  1,36028
 
6

Hopefully the reviewers do not request that you apply your method to the current paper, and thus enter an infinite recursion loop.

log in to reply • written 8 months ago by Matt Shirley  2,08017
 

now seriously, I am sure this has been previously studied and reported in some of those bibliometrics journals. Who will be the first to find some of this papers? :)

log in to reply • written 8 months ago by Flow  1,36028
 

That's hilarious. Really I had just wondered for my own sake of curiosity. I think our rather large group would like to know.

log in to reply • written 8 months ago by Ryan D  2,5401216
 
1

The year/month/day are not always some valid integers. I've updated my code to catch the errors.

log in to reply • written 8 months ago by Pierre Lindenbaum ♦♦ 48,36063483
 

Fantastic. Thanks for such an awesome answer, Pierre.

log in to reply • written 8 months ago by Ryan D  2,5401216
 
Log in to add a post