Does Whole Exome Sequencing Include Mitochondrial Genome??
2
1
Entering edit mode
2.6 years ago
DNAngel ▴ 210

I have exome sequencing data and as far as I know, it should include all exonic regions including the mitochondrial genome. When I blast my sequences to recover mitochondrial protein-coding genes, I get a ton of intermittent stop codons and a lot of gaps which I generally do not find when extracting nuclear genes. The literature tells me it includes all exomic region so I am wondering if this is just a coverage problem or something else? I don't work with mitogenomes generally but after extracting the popular COI gene exons, and aligning everything to the reference sequence the columns of stop codons makes no sense to me.

mitogenome exome whole exome sequencing • 1.4k views
1
Entering edit mode

Why don't you check the coverage of the mitochondrial genes (those having a reasonable non-multimapping MAPQ like > 20 or so) to see if those genes are included.

1
Entering edit mode

As you probably know there are different kits for WES. So please be as specific as possible. For each kit the design files should be available, although some will be easier to find than others. In general I would expect those genes to be included... Although targeting these might lead to severe unbalanced coverage...

1
Entering edit mode
2.6 years ago
h.mon 33k

As noted by WouterDeCoster , you should state the capture kit (and search for its documentation), as different kits will include different capture probes. Illumina exome kits apparently include mitochondrial genes.

What you may be observing is the presence of numt (nuclear copies of mtDNA), see “COI-like” Sequences Are Becoming Problematic in Molecular Systematic and DNA Barcoding Studies for an introduction on numts. The 1000 Genomes project (A global reference for human genetic variation) found that, on average, a typical human genome has 4 numts.

1
Entering edit mode
2.6 years ago

The short answer is no. I don’t believe most exome kits really have intentional mitochondrial sequence because if they did there would be mt variants in projects like ExAC, but there aren’t. Maybe that’s a new development for illumina kits. I looked at the TruSeq Exome Targeted Regions Manifest v1.2 (BED Format), and I did see these regions:

chrM    3306    4262    CEX-chrM-3307-4262
chrM    4469    5511    CEX-chrM-4470-5511
chrM    5903    7445    CEX-chrM-5904-7445
chrM    7585    8266    CEX-chrM-7586-8266
chrM    8365    9204    CEX-chrM-8366-9204
chrM    9206    9990    CEX-chrM-9207-9990
chrM    10058   10404   CEX-chrM-10059-10404
chrM    10469   12137   CEX-chrM-10470-12137
chrM    12336   14145   CEX-chrM-12337-14145
chrM    14148   14673   CEX-chrM-14149-14673
chrM    14746   15887   CEX-chrM-14747-15887


Mitochondria outnumber nuclear dna like 1000 to 1, but the chemistry must be sufficiently different to keep these sequences out of exomes kits, because we’re only talking about a 13 coding genes anyway.

If you are dead set on exomes, there are off-target reads (accidents) which can tell you a lot.