Question: Exon-Seq Mutation Detection
gravatar for jeremy
7.0 years ago by
United States
jeremy80 wrote:

Question about exon-seq data analysis for mutation detection.

I have some exon-seq data and DNA-Seq data from the same samples. The matched normal data is done in whole-genome. I want to compare the mutation detection between exon-Seq and whole-genome sequencing.

Step 1. It makes more sense to map exon-seq data to transcripts after removing introns. Step 2. When comparing with matched normal to call exon-seq mutation, I need to convert the alignement result on transcripts exon into whole genome coordinate because the matched normal alignment is done on whole-genome.

Questions: 1. Is there a pipeline that can do this conversion from transcript exon coordinates into whole-genome coordinates for aligned BAM/SAM file? 2. In general, what would be a better pipeline to call mutation from exon-seq data? or anyone can point me to some resource for this issue?

thanks in advance.

mutation • 2.5k views
ADD COMMENTlink modified 7.0 years ago by swbarnes27.2k • written 7.0 years ago by jeremy80

What exactly do you mean by exon-seq? Exome sequencing? RNA-Seq? Either way I don't see that 'Step 1' is valid. Just map everything to the same reference and then no conversion is needed surely?

ADD REPLYlink written 7.0 years ago by Daniel Swan13k

What if we are only interested in mutation within known exons? It would be faster to map to exons only instead of whole-genome. Also, when mapping to whole-genome, reads might be discarded because of multiple alignment. But if mapped to exon only, some of these reads could be mapped uniquely. What do you think of these factors?

ADD REPLYlink written 7.0 years ago by jeremy80

I think if I have a read that maps better elsewhere in the genome than the supposed capture target I would rather it aligned there in the hope that it doesn't introduce false positive SNP calls.

ADD REPLYlink written 7.0 years ago by Daniel Swan13k

In general, you will get the most accurate alignments if you align to what your sample really is. In Exome capture, what your sequence is is genomic, enriched for exons.

A reference genome of exons with genomic padding would be more accurate than aligning to transcripts, but you might still get off-target reads forced to wrongly align to your reference. With whole genome, they should align to the correct place.

You can use BEDTools after alignment to the whole genome to filter away the reads that don't align to target; that might make the file more manageable.

If a read really aligns to multiple places in the genome, you want to know that! You don't want to just pretend that it must have come from your target, because it might not originate there. Exome capture is far from perfect.

ADD REPLYlink written 7.0 years ago by swbarnes27.2k
gravatar for swbarnes2
7.0 years ago by
United States
swbarnes27.2k wrote:

Do you mean exome capture? Exome capture should be aligned to genome, not transcripts.

RNAseq can be aligned to transcripts, or genome with TopHat, which will attempt to span introns.

ADD COMMENTlink written 7.0 years ago by swbarnes27.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1138 users visited in the last hour