8 weeks ago by
USA / Europe / Brazil
The argument for variant calling from RNA-seq data usually surrounds the fact that it can be cost-effective and negate the necessity to do both DNA- and RNA-seq (and use up both genomic and cDNA in the process). When I think about 'cost effectiveness' in broader terms, I realise that it invariably equates to a lower 'quality' and lower sensitivity or specificity (or both).
Whilst one can very easily call variants from RNA-seq data, one misses the following types of variants:
- variants in alleles that are not expressed (obvious)
- certain types of regulatory variants (obvious)
- variants in genes that result in the gene undergoing non-sense
mediated decay (NMD)
- variants that result in haploinsufficiency
- splicing variants
...and i'm sure that there are much more types that are missed.
Broad Institute have a 'best practices' pipeline for RNA-seq variant calling on their website, but it has never been published anywhere, much to my knowledge. When posted, it was also just tested on a single sample (their words). I take issue with this because many look up to the Broad as a reputable organisation. When they see the Broad publishing methods, they logically assume that the method must be okay to use and may not understand the limitations of such a method unless such limitations are clearly stated up front. They state some limitations lower down, raising concern:
Finally, we know that the current recommended pipeline is producing
both false positives (wrong variant calls) and false negatives (missed
variants) errors. While some of those errors are inevitable in any
pipeline, others are errors that we can and will address in future
versions of the pipeline.
The situation appears even more alarming when one reads anecdotal and published evidence of people who have compared RNA-seq variant calls to whole exome seq (WES) variant calls. Scattered across the WWW, I've seen that RNA-seq variant calling can only detect between ~30% and ~%70 of the variant calls that WES detects, and I assume that these people have obviously filtered the WES data to only include variants in exons in their comparisons.
I'm sorry but I do not and will never recommend variant calling from RNA-seq data based on current procedures. If you do it, you have to be absolutely sure of the limitations. A recent trend in research appears to be toward cost saving in various ways, but this will gradually bring more 'noise' into our data and result in larger problems further down the line.
Others likely have other opinions. Some that defend variant calling from RNA-seq data may be ones who have already performed it and published data from it.