Question: Does Pre-Publishing Open Source Algorithms Hurt Chances Of Getting Published?
19
gravatar for Martijn Van Iersel
7.5 years ago by
Netherlands
Martijn Van Iersel550 wrote:

As a matter of principle, I've always made source of my software available before publication, and I never had trouble with journals because of this. By "available" I mean nothing more than that the software is in a world-readable subversion repository, for example on google code, sourceforge.net or a similar site.

However, I've received warnings that pre-publishing open source software like this might hurt chances of a paper getting accepted in certain journals, especially if the software contains novel algorithms.

So my questions are:

  • Has anybody ever had a paper rejected because the software was already available in a public subversion repository?
  • Which journals would (not) make an issue of this?
  • Do you think this issue is different for novel algorithms versus more mundane types of software?

edit: I'll add one more:

  • Do you know any counter-examples, i.e. software with advanced algorithms that was publicly available, and then published in a high-impact journal.

edit2: Although I've accepted a good answer, the more evidence the better. If you know any more, please share!

publication open algorithm • 2.7k views
ADD COMMENTlink modified 3.6 years ago by Biostar ♦♦ 20 • written 7.5 years ago by Martijn Van Iersel550
7

I couldn't even think about an explanation why this should be any reason for a rejection, can you? Something like, this is already published (on your own web-site, or because the code is bad??)? I don't think it would make sense, I thought that journals are keen on reproducibility and transparency of methods. Also, it is mainly the reviewers' decision, isn't it.

ADD REPLYlink written 7.5 years ago by Michael Dondrup45k
2

In the eyes of journals, novelty in defined with respect to the work of your peers, not with respect to whether or not you have shared your novel work.

ADD REPLYlink written 7.5 years ago by Aaronquinlan10k

@Michael Dondrup, a possible argumentation is that journals only publish novel research, and if something is already publicly available, it is no longer novel.

ADD REPLYlink written 7.5 years ago by Martijn Van Iersel550

Yes! Would like to add that certain high-impact journals consider poster presentations (when made public on i.e. website) prior publication :(

ADD REPLYlink written 7.5 years ago by ALchEmiXt1.9k
25
gravatar for lh3
7.5 years ago by
lh331k
United States
lh331k wrote:

Releasing the software before publication may hurt if someone else learns your idea from the source code. However, others more often learn your idea from your poster/talk. Releasing the source code does not hurt much more. And even at the tiny risk of letting others learn your novel idea, it is still preferred to advertise and release your tool early when it starts functioning. Publishing a paper usually takes months at the minimum. You may lose potential users in this period; when you finally publish your work, most may have already chosen their favorite. This is especially true in a hot and competitive field. If you want others to use your tool, release early.

I do not think the warnings you got have anything to do with the policy of publication. Numerous popular tools published in good journals are released long before the publication. If you are asking for examples - GATK in Nature Genetics; IGV in Nature BioTech; QCall, Thunder, MAQ and SOAPsnp in Genome Research ... (this list will definitely be longer than a screen on your computer).

EDIT: Here are a few examples for my own works, although biologists may not consider Bioinformatics as a high-profile journal.

  • BWA: source code made public on 06/03/2008; manuscript submitted on 02/20/2009 and accepted on 05/12/2009 (11 months before acceptance).

  • BWA-SW: source code made public on 08/20/2009; manuscript submitted on 09/19/2009 and accepted on 12/16/2009 (4 months before acceptance).

  • MAQ: source code in public SVN since 03/22/2007; manuscript submitted on 03/07/2008 and accepted on 08/13/2008 (17 months before acceptance).

  • Samtools: source code in public SVN since 11/25/2008; initial manuscript submitted on 04/28/2009 and accepted on 05/30/2009 (6 months before acceptance). As Brentp has pointed out, the samtools methodology paper was written with all the source code changes directly committed to the public repository.

  • PSMC: git repository made public since 01/22/2011; manuscript accepted on 05/20/2011 (4 month before acceptance). This tool is in no way popular, though.

I have not really checked out the source code of GATK outside Broad (GATK was available in SVN, but I cannot check now if it was public that time). I am relatively certain that IGV has been open sourced for years. You need to registration, but you can get the source code (GPL'ed?). Anyway, if you want to see convincing examples of other projects, here are a few:

  • Bowtie: source code released on 08/18/2008; manuscript accepted on 03/04/2009 by Genome Biology. TopHat from the same group was released one year before acceptance.

  • SoapSNP: source code released on 11/13/2008; manuscript accepted on 03/11/2009 by Genome Research. SOAP from the same group was also released before acceptance.

  • BigWig/BigBed: this feature has long been available in the UCSC source code tree, but published only recently.

  • Phrap: no one will doubt that Phrap can be published at least in Genome Research. You may argue it is not open source, though.

There are more examples if I really want to dig out. Also, sometimes a project may not keep the full history. It is hard to convince you with such examples.

ADD COMMENTlink modified 7.5 years ago • written 7.5 years ago by lh331k
3

@lh3 doesn't mention his own example: the implementation for variant calling was implemented in samtools with maths available here: http://bit.ly/stmath for some time. In addition, the actual manuscript was available here https://github.com/lh3/smtl-paper since May and the paper was just published in advance access in September (In a high-impact jounral): http://bioinformatics.oxfordjournals.org/content/early/2011/09/08/bioinformatics.btr509.abstract

ADD REPLYlink written 7.5 years ago by brentp22k
2

I can confirm the GATK was available from SVN long before moving to GitHub. I pulled the source and dug into some problems I was having at the time in early 2010.

ADD REPLYlink written 7.5 years ago by Brad Chapman9.3k
2

Similarly, bedtools was initially released in early 2009, but not published until early 2010. Two reviewer comments explicit stated it relatively wide acceptance prior to submission as a strong point.

ADD REPLYlink written 7.5 years ago by Aaronquinlan10k

I'm trying to find evidence that the source code of GATK and IGV were really available before they were published. GATK is now on github, but according to the GATK wiki that was set up in June 2011, after publication. IGV requires registration before you can download it, which is definitely not the same as publicly available. TO convince skeptics I really need shiny examples, and these aren't. I haven't checked the other tools you mention.

ADD REPLYlink written 7.5 years ago by Martijn Van Iersel550

The GATK source code was previously available in SVN (http://www.broadinstitute.org/gsa/wiki/index.php?title=Downloading_the_GATK&oldid=670), but now I cannot access that SVN even inside Broad, so I do not know if it was publicly accessible two years ago.

ADD REPLYlink written 7.5 years ago by lh331k

Thanks for the more detailed examples!

ADD REPLYlink written 7.5 years ago by Martijn Van Iersel550
12
gravatar for Sean Davis
7.5 years ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:

I cannot speak for every software project out there, but I do know about the Bioconductor project, which now has several hundred developers and packages, many of which have been published. I have not heard of any developer having his/her algorithms rejected because of inclusion in Bioconductor prior to submission of the paper. In fact, I venture to say that the opposite is true and that software that is public and is being used (as evidenced by download numbers, etc.) is more likely to be accepted by journals.

Of course, the safest approach is to:

  • identify your journals of interest
  • read their instructions for authors (links to Bioinformatics and BMC Bioinformatics, for examples)
  • when in doubt about their stance on the issue, ask the editors for clarification.
ADD COMMENTlink written 7.5 years ago by Sean Davis25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 832 users visited in the last hour