Forum:Improved Peer Review Of Rna-Seq Bioinformatics In Science Publications
3
7
Entering edit mode
10.5 years ago

It seems like every week I find at least one newly published paper that I'm excited to read after scanning the abstract (normally science focused or statistical papers, as opposed to bioinformatics methods papers) but after reading the methods section, I notice too many bioinformatics flaws that have me scratching my head as to why these issues weren't raised during peer review. Most of these articles appear in respected journals.

Without calling out any specific papers, some of the common RNA-Seq issues that bother me the most are:

  1. Not using a splice-aware aligner (such as TopHat) when aligning to the genome.
  2. Aligning to hg18 instead of hg19 (which was released over 4.5 years ago!), or other appropriately old annotation
  3. Using a "new" one-off method for differential expression analysis without comparing to commonly used DE tools or explaining why those tools weren't used.
  4. Not reporting version numbers for annotations and software tools used.

Am I just being too critical, or are others noticing a rise in these types of flawed bioinformatics analyses, especially with respect to RNA-Seq? My reasoning for why these errors not being caught is that the reviewers are experts in their respective field (biology, medicine, genetics, statistics, CS, etc), but they themselves are not involved in the regular processing/analysis of the data, so they are unfamiliar with all of the analysis details.

Should the peer review process change to account for the increasingly complex bioinformatics that are required in processing/analyzing sequencing data? Some journals that I've reviewed for ask if the article under review (1) involves statistical methods and (2) whether I am qualified to review these statistical methods. Should journals start asking the same questions, but for bioinformatics? Would this help ensure that all of the methods are appropriately reviewed by bioinformatics experts? Any other ideas to solve this issue?

RNA-seq • 5.0k views
ADD COMMENT
8
Entering edit mode
10.5 years ago
bede.portz ▴ 540

As a wet lab scientist, not a true bioinformaticist (by any stretch), I think one of the largest problems is the use of so called "in house scripts" for analysis, the details of which are not disclosed in the methods section.

I couldn't do biochemistry and write in the methods section that "the reactions were run using in house buffers" nor could I publish an novel experiment without a thorough enough methods section to enable someone to attempt to repeat the work. The field has long agreed on what needs to be disclosed for bench science, but has not yet come to a consensus with genomics. In many of the genomics experiments with which I am familiar, the analysis is as crucial if not more so to the conclusions of many papers than then preceding bench work, yet the methods sections discloses relatively little about how the data were analyzed.

Perhaps the problem is due in part to the fact that many of the reviewers in the fields of biochemistry and molecular biology weren't trained in genomics and couldn't adequately review the genomics methods even if more detail was provided. In other words, it might be more the fault of the molecular biology establishment that the current generation of genomics and bioinformatics researchers.

This limited disclosure of methods, has, I think, contributed to the code duplication that exists for common analysis such as mapping ChIP-seq reads to reference points, binning the data, clustering, etc. I realize biology and the nature of experiments dictates to some degree the type of analysis that must be carried out, and some customization is required, but it would be a great benefit to the field if bench scientists began to coalesce around some of the tools freely and publicly available so that the tools and parameters used in an analysis could be accurately and concisely cited.

Lastly, think about how much money the NIH has spent paying graduate students and post docs to develop the same/similar tools for analyzing data from something like a ChIP-seq experiment, all in a very difficult funding environment. With countless labs having countless variations of similar tools, some of which may not be readily available, curated, or supported after students leave, I don't see the problem improving soon.

ADD COMMENT
4
Entering edit mode
10.5 years ago

All really good points and it is really not clear what the right solution is. From my own perspective I found that doing a thorough review is very taxing and the rewards for doing a good job are intangible - it would take at least a full day of work to make sure a paper is correct and to document the problems if there were some. Simply put the incentives are not there.

You know how the NHTSA (National Highway Traffic Safety Administration) crash tests ever new car model? We need something similar for science. All individuals involved in a review from authors to reviewers and journals should get a 'science crash" test rating that measures just how well does their work stand up to scrutiny. Probably never going to happen.

ADD COMMENT
1
Entering edit mode

I agree that doing such a thorough review is taxing, and fixing most of the mistakes that I see will likely have minor impact on the results/conclusions. But I've also seen "groundbreaking" papers where shotty bioinformatics is responsible for the seemingly novel results, which is pretty disturbing.

For a good example, see this Science article and comments: http://www.ncbi.nlm.nih.gov/pubmed?term=widespread[Title]%20AND%20rna[Title]%20AND%20dna[Title]%20AND%20differences[Title]%20AND%20human[Title]

ADD REPLY
1
Entering edit mode
10.5 years ago

These are all important points, I agree. Another issue being addressed by the NIH (and others) is access to the raw data. Hopefully, with increased visibility and access to data, there will be a larger incentive to be transparent in analyses.

http://grants.nih.gov/grants/guide/notice-files/NOT-OD-13-119.html

ADD COMMENT

Login before adding your answer.

Traffic: 2450 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6