The argument for variant calling from RNA-seq data usually surrounds the fact that it can be cost-effective and negate the necessity to do both DNA- and RNA-seq (and use up both DNA and RNA / mRNA / cDNA in the process). When I think about 'cost effectiveness' in broader terms, I realise that it invariably equates to a lower 'quality' and lower sensitivity or specificity (or both) compared to gold standards.
Whilst one can very easily call variants from RNA-seq data, one misses the following types of variants:
- variants in alleles that are not expressed (obvious)
- certain types of regulatory variants (obvious)
- variants in genes that result in the gene undergoing non-sense
mediated decay (NMD) (these may still be detected depending on the wet-lab capture method employed)
- variants that result in haploinsufficiency
- splicing variants
- variants in low-expressed genes, e.g., non-coding genes (could be detected at low read-depths, but then you would introduce false-positive variant calls elsewhere)
...and I'm sure that there are much more types that are missed.
Broad Institute have a 'best practices' pipeline for RNA-seq variant calling on their website, but it has never been published anywhere in a scientific journal, much to my knowledge. When posted, it was also just tested on a single sample (their words). I take issue with this because many look up to the Broad as a reputable organisation. When they see the Broad outlining methods on their web-site, they logically assume that the method must be okay to use and may not understand the limitations of such a method unless such limitations are clearly stated up front. They state some limitations lower down, raising concern:
Finally, we know that the current recommended pipeline is producing
both false positives (wrong variant calls) and false negatives (missed
variants) errors. While some of those errors are inevitable in any
pipeline, others are errors that we can and will address in future
versions of the pipeline.
The situation appears even more alarming when one reads anecdotal and published evidence of people who have compared RNA-seq variant calls to whole exome seq (WES) variant calls. Scattered across the WWW, I've seen that RNA-seq variant calling can only detect between ~30% and ~70% of the variant calls that WES detects, and I assume that these people have obviously filtered the WES data to only include, in their comparisons, variants in exons that could be assumed to also be detected from RNA-seq reads.
So, if you do variant calling from RNA-seq, you have to be absolutely sure of the limitations. A recent trend in research appears to be toward cost saving in various ways, but this will gradually bring more 'noise' into our data and result in larger problems further down the line.
Others likely have other opinions. Some that defend variant calling from RNA-seq data may be ones who have already performed it and published data from it. On that note, keep in mind, in addition, that most journals are profit-oriented and need to publish works in order to survive. The field is also now 'flooded' with bogus journals that will publish anything, fantasy or otherwise.
In general, transcription may not involve both the copies/alleles of a gene. Expression might be affected by expression in different ratios or one of them is imprinted or expression is low under the measured conditions. Whatever information comes from RNA-seq is limited to measurable, transcriptionally active copies under experiment.