Has anyone tried calling variants from RNA-seq data and comparing those with WGS/Exome sequencing variant calls in coding regions? I was curious to know if the same variant callers can be used on RNA-seq alignment (say TopHat alignments). Also, if there are tools that can predict RNA-editing or similar events.
You should check out the SNVMix papers here and here. They developed and used their method on RNA-seq tumor data and compared to "ground-truth" of genotype arrays and WGS. They also showed their approach could identify RNA-editing events. And, they have a follow-up method for matched tumor-normal samples called JointSNVMix. Although I think the latter was developed more for exome-seq.
We've done quite a few variant calling from mRNA-Seq data for EMS mutant identifications. But we haven't compared with WGS/Exom yet. We used BWA for mapping, and samtools as well as GATK pipeline for variant calling. Both yielded pretty consistent results. One thing turned out to be very important for our purpose, i. e. detecting high quality SNPs in the coding regions, is that you have to trim aggressively to remove bases of bad quality, even at the cost of losing coverage in some areas. With really stringent quality trimming, we've successfully identified several mutant alleles that can be verified by Sanger sequencing or restriction enzyme genotyping.
Thanks, we are following a similar approach as well so good to know we are not alone :). What reference did you use for bwa alignments? Custom transcriptome using known transcripts (>150,000?) Or some trick to use spliced alignments using bwa?
For the mapping question, it is probably worth looking at: http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btr427v1
what would you call consistent results? We routinely see over-representation of FPs near the splice junctions for rnaSeq SNV calls. And this is comparing the data to dna-seq making sure variants have good enough coverage of reads for a confident SNV call
Indeed, benchmarks have been done since this post. See them here in my answer: A: Inferring genotype based on RNA sequnces
If you're interested in inferring RNA-editing from RNA-seq, you should be sure to read the responses to the Li et al Science paper on that topic published recently in Science and commentary on that topic published on the Genomes Unzipped blog.
I don't believe any of us should be encouraging variant calling from RNA-seq. Here is why:
A: Inferring genotype based on RNA sequnces
From Broad Institute on their RNA-seq variant calling pipeline:
Another benchmark, from studies that came out since my colleagues posted their answers (below):