Question

Exon-Seq Mutation Detection

1

Entering edit mode

11.2 years ago

jeremy ▴ 80

Question about exon-seq data analysis for mutation detection.

I have some exon-seq data and DNA-Seq data from the same samples. The matched normal data is done in whole-genome. I want to compare the mutation detection between exon-Seq and whole-genome sequencing.

Step 1. It makes more sense to map exon-seq data to transcripts after removing introns. Step 2. When comparing with matched normal to call exon-seq mutation, I need to convert the alignement result on transcripts exon into whole genome coordinate because the matched normal alignment is done on whole-genome.

Questions: 1. Is there a pipeline that can do this conversion from transcript exon coordinates into whole-genome coordinates for aligned BAM/SAM file? 2. In general, what would be a better pipeline to call mutation from exon-seq data? or anyone can point me to some resource for this issue?

thanks in advance.

mutation • 3.3k views

ADD COMMENT • link updated 11.2 years ago by swbarnes2 14k • written 11.2 years ago by jeremy ▴ 80

3

Entering edit mode

What exactly do you mean by exon-seq? Exome sequencing? RNA-Seq? Either way I don't see that 'Step 1' is valid. Just map everything to the same reference and then no conversion is needed surely?

ADD REPLY • link 11.2 years ago by User 59 13k

0

Entering edit mode

What if we are only interested in mutation within known exons? It would be faster to map to exons only instead of whole-genome. Also, when mapping to whole-genome, reads might be discarded because of multiple alignment. But if mapped to exon only, some of these reads could be mapped uniquely. What do you think of these factors?

ADD REPLY • link 11.2 years ago by jeremy ▴ 80

2

Entering edit mode

I think if I have a read that maps better elsewhere in the genome than the supposed capture target I would rather it aligned there in the hope that it doesn't introduce false positive SNP calls.

ADD REPLY • link 11.2 years ago by User 59 13k

1

Entering edit mode

In general, you will get the most accurate alignments if you align to what your sample really is. In Exome capture, what your sequence is is genomic, enriched for exons.

A reference genome of exons with genomic padding would be more accurate than aligning to transcripts, but you might still get off-target reads forced to wrongly align to your reference. With whole genome, they should align to the correct place.

You can use BEDTools after alignment to the whole genome to filter away the reads that don't align to target; that might make the file more manageable.

If a read really aligns to multiple places in the genome, you want to know that! You don't want to just pretend that it must have come from your target, because it might not originate there. Exome capture is far from perfect.

ADD REPLY • link 11.2 years ago by swbarnes2 14k

score 3 · Answer 1 · 2013-01-28

3

Entering edit mode

11.2 years ago

swbarnes2 14k

Do you mean exome capture? Exome capture should be aligned to genome, not transcripts.

RNAseq can be aligned to transcripts, or genome with TopHat, which will attempt to span introns.

ADD COMMENT • link 11.2 years ago by swbarnes2 14k