Strange coverage of mtDNA - need help interpreting MiSeq results
2
2
Entering edit mode
4.1 years ago
tatiatcg ▴ 20

We sequenced human mtDNA for multiple individuals. Our method involved making 2 partially overlapping PCR amplicons of mtDNA and then sequencing them on Illumina MiSeq (paired-end, ~350 average insert size).

I removed adapters, mapped the reads to hg38 reference using bwa and I am observing a strange coverage drop at position 3,234. It is present at this exact position in all samples (multiple separate batches) but in some samples the coverage drops from several hundred fold to 10-15x while in others it only drops to about half the coverage of the neighboring region:

coverage of 2 of the samples

This position is in the middle of one of the PCR amplicons and not near any of the primers used to make the amplicons.

I am trying to figure out what is happening.

1) Does the shape of the coverage track indicate anything? I am confused why the coverage drops so suddenly at position 3,234 (in a straight vertical line) but then gradually increases from almost 0 starting at position 3,235. How do I interpret that?

2) What should I try to figure this out (things I have already done are below)?. I'd like to solve this in silico but if I can't, will PCR amplify the region.

Here is what I have done so far:

1) checked primer specificity using BLAST - none of the primer sequences match anywhere besides the intended positions on human mtDNA

2) de novo assembled several samples - in all cases the region downstream of position 3,234 didn't assemble correctly (when I map reads back to the assembly, that region looks messy and is not uniformly covered; when I BLAST the part of the assembly downstream of 3,234 it aligns to various parts of mtDNA that are not supposed to be near position 3,234 but there is no clear pattern between different samples

3) looked at soft-clipped reads on both sides of position 3,234. Some soft-clipped parts of reads cannot be found anywhere at all, others map to various parts of mtDNA - again no clear pattern.

4) for reads that map near position 3,234, I looked where the other read in the pair maps - again, there is no clear pattern

5) looked at incorrectly oriented reads - there are some throughout the alignment and for the ones where one read is near position 3,234 again there is no clear pattern to where the second read in the pair is.

Thank you very much in advance!

next-gen sequencing alignment igv mtdna • 1.3k views
ADD COMMENT
0
Entering edit mode

Please use these directions to add images: How to add images to a Biostars post

ADD REPLY
0
Entering edit mode

Looking at the BAM may be too late to diagnose what is happening. I would start by grepping your raw reads for mtDNA sequence around that 3234 area - agatggc, not much longer than that. https://www.mitomap.org/foswiki/bin/view/MITOMAP/HumanMitoSeq

If they don't even exist you know it's something in the wet lab.

If they do exist then align those in isolation - using BLAST and BWA - see why they don't align and get back to us.

ADD REPLY
3
Entering edit mode
4.1 years ago
Michael 54k

First thing to do is to investigate the region in a genome browser. Here is how your region looks in UCSC. Your coordinate overlaps an annotated repeat: tRNA-Leu-TTA(m) seemingly a tandem repeat of variable length. Your read pairs likely do not span the real repeat region which is actually longer than the sequence depicted in the assembly, or if they do span it, the resulting read sequence is not identical to the reference. This is also the reason why you couldn't assemble this region.

So, when you look at the genome at this spot, for which the actual assembly gives:

>hg38_rmsk_tRNA-Leu-TTA(m) range=chrM:3230-3307 5'pad=0 3'pad=0 strand=+ repeatMasking=lower
gttaagatggcagagcccggtaatcgcataaaacttaaaactttacagtc
agaggttcaattcctcttcttaacaaca

it could be in fact 10 or 100 times that sequence, and with variability between individuals as well. No read pair could span the beginning of the repeat and still align to the reference.

If you need the exact sequence length at this site, you could do nanopore sequencing. I estimate that the actual repeat rate is not much more than 10x - 20x, limited by the fragment size of the specific polymerase used during PCR (pcr_length = 1400bp+78*x if there is only this repeat). You could maybe also measure your real fragment length using a southern blot.

ADD COMMENT
0
Entering edit mode

thank you very much!!

ADD REPLY
0
Entering edit mode
4.1 years ago

You've got some kind of indel going on there. That's what causes that coverage pattern.

de novo assembled several samples - in all cases the region downstream of position 3,234 didn't assemble correctly

Not 'correctly'? Or not what you expect?

ADD COMMENT
0
Entering edit mode

Yes, an "insertion" at/in a tandem repeat ;)

ADD REPLY
0
Entering edit mode

Thank you! By correctly I meant if the reads don't map back to the de novo assembly creating at least somewhat uniform coverage and instead the alignment is spotty, there must be an assembly issue.

ADD REPLY

Login before adding your answer.

Traffic: 3082 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6