Question

Rna-Seq Data Variant Calling

5

Entering edit mode

12.1 years ago

learnerforever ▴ 520

Has anyone tried calling variants from RNA-seq data and comparing those with WGS/Exome sequencing variant calls in coding regions? I was curious to know if the same variant callers can be used on RNA-seq alignment (say TopHat alignments). Also, if there are tools that can predict RNA-editing or similar events.

rna-seq tophat • 15k views

ADD COMMENT • link updated 12.0 years ago by Obi Griffith 20k • written 12.1 years ago by learnerforever ▴ 520

3

Entering edit mode

If you're interested in inferring RNA-editing from RNA-seq, you should be sure to read the responses to the Li et al Science paper on that topic published recently in Science and commentary on that topic published on the Genomes Unzipped blog.

ADD REPLY • link 12.1 years ago by David Quigley 11k

0

Entering edit mode

I don't believe any of us should be encouraging variant calling from RNA-seq. Here is why:

A: Inferring genotype based on RNA sequnces

From Broad Institute on their RNA-seq variant calling pipeline:

Finally, we know that the current recommended pipeline is producing both false positives (wrong variant calls) and false negatives (missed variants) errors. While some of those errors are inevitable in any pipeline, others are errors that we can and will address in future versions of the pipeline.

Another benchmark, from studies that came out since my colleagues posted their answers (below):

The situation appears even more alarming when one reads anecdotal and published evidence of people who have compared RNA-seq variant calls to whole exome seq (WES) variant calls. Scattered across the WWW, I've seen that RNA-seq variant calling can only detect between ~30% and ~70% of the variant calls that WES detects, and I assume that these people have obviously filtered the WES data to only include variants in exons in their comparisons.

ADD REPLY • link 5.3 years ago by Kevin Blighe 87k

score 7 · Answer 1 · 2012-04-03

You should check out the SNVMix papers here and here. They developed and used their method on RNA-seq tumor data and compared to "ground-truth" of genotype arrays and WGS. They also showed their approach could identify RNA-editing events. And, they have a follow-up method for matched tumor-normal samples called JointSNVMix. Although I think the latter was developed more for exome-seq.

score 6 · Answer 2 · 2012-04-03

6

Entering edit mode

12.1 years ago

Vitis ★ 2.5k

We've done quite a few variant calling from mRNA-Seq data for EMS mutant identifications. But we haven't compared with WGS/Exom yet. We used BWA for mapping, and samtools as well as GATK pipeline for variant calling. Both yielded pretty consistent results. One thing turned out to be very important for our purpose, i. e. detecting high quality SNPs in the coding regions, is that you have to trim aggressively to remove bases of bad quality, even at the cost of losing coverage in some areas. With really stringent quality trimming, we've successfully identified several mutant alleles that can be verified by Sanger sequencing or restriction enzyme genotyping.

ADD COMMENT • link 12.1 years ago by Vitis ★ 2.5k

1

Entering edit mode

Thanks, we are following a similar approach as well so good to know we are not alone :). What reference did you use for bwa alignments? Custom transcriptome using known transcripts (>150,000?) Or some trick to use spliced alignments using bwa?

ADD REPLY • link 12.1 years ago by learnerforever ▴ 520

1

Entering edit mode

For the mapping question, it is probably worth looking at: http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btr427v1

ADD REPLY • link 12.1 years ago by Sean Davis 26k

0

Entering edit mode

We've been using predicted CDSs as references, because our system was not highly annotated, we ignored alternative transcription for the moment. I tried genome mapping, too, and got very similar results as you'll lose 5% junction reads.

ADD REPLY • link 12.1 years ago by Vitis ★ 2.5k

0

Entering edit mode

what would you call consistent results? We routinely see over-representation of FPs near the splice junctions for rnaSeq SNV calls. And this is comparing the data to dna-seq making sure variants have good enough coverage of reads for a confident SNV call

ADD REPLY • link 12.1 years ago by Bioinfosm ▴ 620

0

Entering edit mode

Indeed, benchmarks have been done since this post. See them here in my answer: A: Inferring genotype based on RNA sequnces

ADD REPLY • link 5.3 years ago by Kevin Blighe 87k

0

Entering edit mode

Well, in our system, or more properly, evolutionary scale, CNVs are very rare. SNPs and indels are the vast majority of variant types. I have no idea and experience of CNVs.

ADD REPLY • link 12.1 years ago by Vitis ★ 2.5k

0

Entering edit mode

do we not need to go for any normalization method before calling variations on mRNA Seq data