Calculating Time From Submission To Publication / Degree Of Burden In Submitting A Paper
1
21
Entering edit mode
10.0 years ago
Ryan D ★ 3.4k

I was wondering if it would be possible to calculate some kind of a metric for the speed-of-publication for each journal. I'm not sure submitted and accepted dates are available for all papers, but I noticed in XML data there are fields like the following:

<PubMedPubDate PubStatus="received">
<Year>2011</Year>
<Month>12</Month>
<Day>13</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted">
<Year>2012</Year>
<Month>4</Month>
<Day>2</Day>
</PubMedPubDate>
<Year>2012</Year>
<Month>4</Month>
<Day>2</Day>


Is there any type of tool which calculates the average time from which a paper is submitted to the time it is published? Or is there a way that this kind of information could be abstracted from this database to give a aggregate estimate of turn-around time? Has someone already done this? And--not to get too off topic--but what other kinds of measures would be useful to evaluate the degree of burden in submitting a paper?

EDIT: Pierre really took this to the next level in answering this question. The table he produced is very interesting and informative and his complete results are posted at figshare. Check it out. Or try it out.

pubmed publication • 6.6k views
2
Entering edit mode

I would title this question as "Degree of burden in submitting a paper" :) !

2
Entering edit mode

It would be interesting to calculate results per journal and compare to what the publisher claims is turnaround time :)

0
Entering edit mode

That's a good point. There are a lot of claims about the speed of the review process made by journals but as far as I know there is no one who checks these facts. Our experience with some journals has certainly deviated a great deal from their claims.

1
Entering edit mode

I've played with my java program and uploaded the results on figshare: http://dx.doi.org/10.6084/m9.figshare.96403

1
Entering edit mode

Wish I had this when I was trying to calculate the embargo-induced delays in publication of the ENCODE papers http://caseybergman.wordpress.com/2012/09/05/the-cost-to-science-of-the-encode-publication-embargo/

0
Entering edit mode

Very useful idea!

0
Entering edit mode

This is an issue in the wet-lab world for sure: http://www.nature.com/news/2011/110427/full/472391a.html

I wonder if there is a similar phenomenon among bioinformatics journals. "Please provide tests of extra use cases..." that sort of thing. Anyone had that experience?

11
Entering edit mode
10.0 years ago

The following java program parses a pubmed XML from stdin and prints the difference of days beteen "received" and "accepted":

import java.io.InputStream;
import java.util.GregorianCalendar;
import java.util.concurrent.TimeUnit;

import javax.xml.namespace.QName;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.events.Attribute;
import javax.xml.stream.events.XMLEvent;

public class Biostar54473
{
private static class PubMedPubDate
{
int year;
int month=-1;
int day=-1;
@Override
public String toString() {
String s=String.format("%04d", year);
if(month!=-1)
{
s+="-"+String.format("%02d", month);
if(day!=-1)
{
s+="-"+String.format("%02d", day);
}
}
return s;
}
long getTimeInMillis()
{
GregorianCalendar cal=new GregorianCalendar(
year,
month==-1?0:month-1,
month==-1 || day==-1?
1:day);
return cal.getTimeInMillis();
}
}

private void parse(InputStream in) throws Exception
{
XMLInputFactory factory = XMLInputFactory.newInstance();
factory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, Boolean.FALSE);
factory.setProperty(XMLInputFactory.IS_VALIDATING, Boolean.FALSE);
factory.setProperty(XMLInputFactory.IS_COALESCING, Boolean.TRUE);
factory.setProperty(XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, Boolean.FALSE);
String PubStatus=null;
PubMedPubDate curr=null;
PubMedPubDate accepted=null;
String MedlineTA=null;
String pmid=null;
String ArticleTitle=null;
QName attPubStatus=new QName("PubStatus");
while(r.hasNext())
{
XMLEvent evt=r.nextEvent();
if(evt.isStartElement())
{
String name=evt.asStartElement().getName().getLocalPart();
if(name.equals("PubmedArticle"))
{
pmid=null;
accepted=null;
MedlineTA=null;
pmid=null;
ArticleTitle=null;
}
else if(name.equals("ArticleTitle") && ArticleTitle==null)
{
ArticleTitle=r.getElementText().trim();
}
else if(name.equals("PMID") && pmid==null)
{
pmid=r.getElementText().trim();
}
else if(name.equals("MedlineTA") && MedlineTA==null)
{
MedlineTA=r.getElementText().trim();
}
else if(name.equals("PubMedPubDate"))
{
curr=null;
Attribute att=evt.asStartElement().getAttributeByName(attPubStatus);
if(att!=null) PubStatus=att.getValue();

{
curr=new PubMedPubDate();
}
else if("accepted".equals(PubStatus))
{
curr=new PubMedPubDate();
accepted=curr;
}
else
{
curr=null;
}
}

else if(curr!=null && name.equals("Year"))
{
try { curr.year=Integer.parseInt(r.getElementText().trim()); } catch(Exception err) { curr=null;received=null;ok=false;}
}
else if(curr!=null && name.equals("Month"))
{
String month=r.getElementText().trim().toLowerCase();
if(month.equals("jan") || month.equals("january")) month="1";
else if(month.equals("feb") || month.equals("february")) month="2";
else if(month.equals("mar") || month.equals("march")) month="3";
else if(month.equals("apr") || month.equals("april")) month="4";
else if(month.equals("may") || month.equals("may")) month="5";
else if(month.equals("jun") || month.equals("june")) month="6";
else if(month.equals("jul") || month.equals("july")) month="7";
else if(month.equals("aug") || month.equals("august")) month="8";
else if(month.equals("sep") || month.equals("september")) month="9";
else if(month.equals("oct") || month.equals("october")) month="10";
else if(month.equals("nov") || month.equals("november")) month="11";
else if(month.equals("dec") || month.equals("december")) month="12";
try { curr.month=Integer.parseInt(month); } catch(Exception err) { curr=null;accepted=null;ok=false;}
}
else if(curr!=null && name.equals("Day"))
{
try { curr.day=Integer.parseInt(r.getElementText().trim()); } catch(Exception err) { curr=null;accepted=null;ok=false;}
}

}
else if(evt.isEndElement())
{
String name=evt.asEndElement().getName().getLocalPart();
if(name.equals("PubmedArticle"))
{
{
System.out.println(
pmid+"\t"+
ArticleTitle+"\t"+
MedlineTA+"\t"+
accepted+"\t"+
TimeUnit.DAYS.convert(n, TimeUnit.MILLISECONDS)
);
}
ArticleTitle=null;
MedlineTA=null;
pmid=null;
curr=null;
accepted=null;
}
else if(name.equals("PubMedPubDate"))
{
curr=null;
}
}
}
}
public static void main(String[] args) throws Exception
{
System.out.println("#pmid\t"+
"ArticleTitle\t"+
"MedlineTA\t"+
"Accepted\t"+
"DiffDays"
);
new Biostar54473().parseSystem.in);
}

}


A 'verticalized' example for a few papers containing the word "Next generation Sequencing" in the title. You can read this in R# or whatever to get some stats about a journal, a subject, etc...

$javac Biostar54473.java && cat pubmed_result.xml | java Biostar54473 >>> 2$1    #pmid           23020966
$2 ArticleTitle Transcriptome analysis using next-generation sequencing.$3    MedlineTA       Curr Opin Biotechnol
$4 Received 2012-07-04$5    Accepted        2012-09-04
$6 DiffDays 62 <<< 2 >>> 3$1    #pmid           23000871
$2 ArticleTitle Understanding pathogens in the era of next generation sequencing.$3    MedlineTA       J Infect Dev Ctries
$4 Received 2012-09-13$5    Accepted        2012-09-14
$6 DiffDays 1 <<< 3 >>> 4$1    #pmid           22994565
$2 ArticleTitle Accurate variant detection across non-amplified and whole genome amplified DNA using targeted next generation sequencing.$3    MedlineTA       BMC Genomics
$4 Received 2012-01-30$5    Accepted        2012-09-20
$6 DiffDays 233 <<< 4 (...) >>> 253$1    #pmid           18604217
$2 ArticleTitle Alta-Cyclic: a self-optimizing base caller for next-generation sequencing.$3    MedlineTA       Nat Methods
$4 Received 2008-03-10$5    Accepted        2008-06-02
$6 DiffDays 83 <<< 253 >>> 254$1    #pmid           18262675
$2 ArticleTitle The impact of next-generation sequencing technology on genetics.$3    MedlineTA       Trends Genet
$4 Received 2007-11-15$5    Accepted        2007-12-17
\$6    DiffDays        32
<<<    254

1
Entering edit mode

The year/month/day are not always some valid integers. I've updated my code to catch the errors.

0
Entering edit mode

Fantastic. Thanks for such an awesome answer, Pierre.

0
Entering edit mode

This looks like it should work. I'm unfamiliar with java so much. I got an error: javac Biostar54473.java && cat pubmed_result.xml | java Biostar54473

# pmid ArticleTitle MedlineTA Received Accepted DiffDays

Exception in thread "main" javax.xml.stream.XMLStreamException: ParseError at [row,col]:[132,2] Message: The markup in the document following the root element must be well-formed. at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:591) at com.sun.xml.internal.stream.XMLEventReaderImpl.nextEvent(XMLEventReaderImpl.java:83) at Biostar54473.parse(Biostar54473.java:63) at Biostar54473.main(Biostar54473.java:162)

Any ideas?

1
Entering edit mode

0
Entering edit mode

Perfect. That showed my XML file was malformed. The new file worked perfectly. One way I can think to improve this would be to use an alternate date if one of those is not available. For instance, of 2608 Pubmed articles on "Next Generation Sequencing", I only get output for . This is because only 1114 have an entry for <PubMedPubDate PubStatus="received"> and <PubMedPubDate PubStatus="accepted">. This is still really great. And doing as Pierre said and loading the results into R can give a great idea of the average "degree of burden" in submitting a paper as Khadeer called it. :-) Masterful. Thanks again, Pierre.

2
Entering edit mode

will you prepare a manuscript indicating your results? keep us up to date!

7
Entering edit mode

Hopefully the reviewers do not request that you apply your method to the current paper, and thus enter an infinite recursion loop.

0
Entering edit mode

now seriously, I am sure this has been previously studied and reported in some of those bibliometrics journals. Who will be the first to find some of this papers? :)

1
Entering edit mode
0
Entering edit mode

That's hilarious. Really I had just wondered for my own sake of curiosity. I think our rather large group would like to know.