Question: Rna-Seq Data Variant Calling
5
gravatar for learnerforever
7.7 years ago by
learnerforever520 wrote:

Has anyone tried calling variants from RNA-seq data and comparing those with WGS/Exome sequencing variant calls in coding regions? I was curious to know if the same variant callers can be used on RNA-seq alignment (say TopHat alignments). Also, if there are tools that can predict RNA-editing or similar events.

tophat rna-seq • 12k views
ADD COMMENTlink modified 7.6 years ago by Obi Griffith18k • written 7.7 years ago by learnerforever520
3

If you're interested in inferring RNA-editing from RNA-seq, you should be sure to read the responses to the Li et al Science paper on that topic published recently in Science and commentary on that topic published on the Genomes Unzipped blog.

ADD REPLYlink written 7.7 years ago by David Quigley11k

I don't believe any of us should be encouraging variant calling from RNA-seq. Here is why:

A: Inferring genotype based on RNA sequnces

From Broad Institute on their RNA-seq variant calling pipeline:

Finally, we know that the current recommended pipeline is producing both false positives (wrong variant calls) and false negatives (missed variants) errors. While some of those errors are inevitable in any pipeline, others are errors that we can and will address in future versions of the pipeline.

Another benchmark, from studies that came out since my colleagues posted their answers (below):

The situation appears even more alarming when one reads anecdotal and published evidence of people who have compared RNA-seq variant calls to whole exome seq (WES) variant calls. Scattered across the WWW, I've seen that RNA-seq variant calling can only detect between ~30% and ~70% of the variant calls that WES detects, and I assume that these people have obviously filtered the WES data to only include variants in exons in their comparisons.

ADD REPLYlink modified 11 months ago • written 15 months ago by Kevin Blighe52k
7
gravatar for Obi Griffith
7.7 years ago by
Obi Griffith18k
Washington University, St Louis, USA
Obi Griffith18k wrote:

You should check out the SNVMix papers here and here. They developed and used their method on RNA-seq tumor data and compared to "ground-truth" of genotype arrays and WGS. They also showed their approach could identify RNA-editing events. And, they have a follow-up method for matched tumor-normal samples called JointSNVMix. Although I think the latter was developed more for exome-seq.

ADD COMMENTlink written 7.7 years ago by Obi Griffith18k
5
gravatar for Vitis
7.7 years ago by
Vitis2.3k
New York
Vitis2.3k wrote:

We've done quite a few variant calling from mRNA-Seq data for EMS mutant identifications. But we haven't compared with WGS/Exom yet. We used BWA for mapping, and samtools as well as GATK pipeline for variant calling. Both yielded pretty consistent results. One thing turned out to be very important for our purpose, i. e. detecting high quality SNPs in the coding regions, is that you have to trim aggressively to remove bases of bad quality, even at the cost of losing coverage in some areas. With really stringent quality trimming, we've successfully identified several mutant alleles that can be verified by Sanger sequencing or restriction enzyme genotyping.

ADD COMMENTlink written 7.7 years ago by Vitis2.3k
1

Thanks, we are following a similar approach as well so good to know we are not alone :). What reference did you use for bwa alignments? Custom transcriptome using known transcripts (>150,000?) Or some trick to use spliced alignments using bwa?

ADD REPLYlink written 7.7 years ago by learnerforever520
1

For the mapping question, it is probably worth looking at: http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btr427v1

ADD REPLYlink written 7.7 years ago by Sean Davis25k

We've been using predicted CDSs as references, because our system was not highly annotated, we ignored alternative transcription for the moment. I tried genome mapping, too, and got very similar results as you'll lose 5% junction reads.

ADD REPLYlink written 7.7 years ago by Vitis2.3k

what would you call consistent results? We routinely see over-representation of FPs near the splice junctions for rnaSeq SNV calls. And this is comparing the data to dna-seq making sure variants have good enough coverage of reads for a confident SNV call

ADD REPLYlink written 7.7 years ago by Bioinfosm620

Indeed, benchmarks have been done since this post. See them here in my answer: A: Inferring genotype based on RNA sequnces

ADD REPLYlink written 11 months ago by Kevin Blighe52k

Well, in our system, or more properly, evolutionary scale, CNVs are very rare. SNPs and indels are the vast majority of variant types. I have no idea and experience of CNVs.

ADD REPLYlink written 7.7 years ago by Vitis2.3k

do we not need to go for any normalization method before calling variations on mRNA Seq data

ADD REPLYlink written 7.3 years ago by bharati.mehani0

Just do targeted DNA-seq.

ADD REPLYlink written 11 months ago by Kevin Blighe52k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1826 users visited in the last hour