Question

A Question About Annotation To Exome Seq

1

Entering edit mode

11.8 years ago

camelbbs ▴ 710

Hi,

While I use annovar to annotate the exome seq data, I get two files annovar.variant and annovar.exonic_variant.

My question is If exome seq only focus in exonic region, why the annovar also get the info in other regions: intronic, intergenic, etc?

thanks.

exome sequencing • 3.6k views

ADD COMMENT • link updated 10.3 years ago by Biostar 20 • written 11.8 years ago by camelbbs ▴ 710

score 5 · Answer 1 · 2012-08-02

5

Entering edit mode

11.8 years ago

Matt Shirley 10k

Most exon capture methods for enrichment prior to "exome" sequencing actually capture something more than what they target. This makes sense, if you consider the fact that the oligos that target specific sequences do not have to be anywhere near as large as the fragment of DNA they capture. There is a great review of three current capture platforms I would urge you to read. Also, "exonic" is defined by the regions you supply. If this is RefSeq genes, then it will be somewhat more conservative than something like UCSC genes.

ADD COMMENT • link 11.8 years ago by Matt Shirley 10k

1

Entering edit mode

This paper also has a comparison of WES vs WGS, for those who find such things interesting...

ADD REPLY • link 11.8 years ago by Alex Paciorkowski 3.5k

0

Entering edit mode

The annovar.variant result have 144k rows, but the annovar.exonic-variant result only have about 15k rows. So that's my doubts there.

Thanks. I will look up the papers.

ADD REPLY • link 11.8 years ago by camelbbs ▴ 710

0

Entering edit mode

If you haven't done so already, I would use something like Picard to calculate your target region stats:

http://picard.sourceforge.net/picard-metric-definitions.shtml#HsMetrics

My guess is that you'll probably see a fair amount of off-target reads, especially if you remove duplicates (for example, I think you are doing pretty good of you get 60% on-target unique reads).

It is also worth taking into consideration your total number of reads. Let's say your on-target coverage is 80x versus 40x and your off-target coverage is 10x vs. 5x, respectively. Doubling the coverage probably results in the same on-target variants, but you will have much higher power to detect off-target variants.

ADD REPLY • link 10.3 years ago by Charles Warden 8.2k