Exon-Seq Mutation Detection
1
1
Entering edit mode
11.2 years ago
jeremy ▴ 80

Question about exon-seq data analysis for mutation detection.

I have some exon-seq data and DNA-Seq data from the same samples. The matched normal data is done in whole-genome. I want to compare the mutation detection between exon-Seq and whole-genome sequencing.

Step 1. It makes more sense to map exon-seq data to transcripts after removing introns. Step 2. When comparing with matched normal to call exon-seq mutation, I need to convert the alignement result on transcripts exon into whole genome coordinate because the matched normal alignment is done on whole-genome.

Questions: 1. Is there a pipeline that can do this conversion from transcript exon coordinates into whole-genome coordinates for aligned BAM/SAM file? 2. In general, what would be a better pipeline to call mutation from exon-seq data? or anyone can point me to some resource for this issue?

thanks in advance.

mutation • 3.3k views
ADD COMMENT
3
Entering edit mode

What exactly do you mean by exon-seq? Exome sequencing? RNA-Seq? Either way I don't see that 'Step 1' is valid. Just map everything to the same reference and then no conversion is needed surely?

ADD REPLY
0
Entering edit mode

What if we are only interested in mutation within known exons? It would be faster to map to exons only instead of whole-genome. Also, when mapping to whole-genome, reads might be discarded because of multiple alignment. But if mapped to exon only, some of these reads could be mapped uniquely. What do you think of these factors?

ADD REPLY
2
Entering edit mode

I think if I have a read that maps better elsewhere in the genome than the supposed capture target I would rather it aligned there in the hope that it doesn't introduce false positive SNP calls.

ADD REPLY
1
Entering edit mode

In general, you will get the most accurate alignments if you align to what your sample really is. In Exome capture, what your sequence is is genomic, enriched for exons.

A reference genome of exons with genomic padding would be more accurate than aligning to transcripts, but you might still get off-target reads forced to wrongly align to your reference. With whole genome, they should align to the correct place.

You can use BEDTools after alignment to the whole genome to filter away the reads that don't align to target; that might make the file more manageable.

If a read really aligns to multiple places in the genome, you want to know that! You don't want to just pretend that it must have come from your target, because it might not originate there. Exome capture is far from perfect.

ADD REPLY
3
Entering edit mode
11.2 years ago

Do you mean exome capture? Exome capture should be aligned to genome, not transcripts.

RNAseq can be aligned to transcripts, or genome with TopHat, which will attempt to span introns.

ADD COMMENT

Login before adding your answer.

Traffic: 2420 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6