Question: How To Explain Uneven Coverage Of A Dna Seqment Obtained Via Pcr Amplification.
4
gravatar for rohan
5.1 years ago by
rohan100
United States
rohan100 wrote:

Experiment: deep sequencing for mutants in 700nt fragment.

the fragment of dna was preamplified by primers flanking the fragment followed by hiseq.

per base coverage was calculated by coverageBed -d -abam in.bam -b ref.bed > out.cov

Observation: two distinct peaks in coverage at the ends as below plot.. coverage vs positions

enter image description here

the peaks are made from reads having part of primers..thus also show soft clipping at ends..

there is a huge difference in the calculations if i include such reads And if I exclude them.

Question: is there anyone who knows how to handle such a situation?

bedtools coverage • 3.2k views
ADD COMMENTlink modified 4.9 years ago by swbarnes25.6k • written 5.1 years ago by rohan100
1

can you make that region wider? what happens further out, plus also can you indicate the primer locations.

ADD REPLYlink modified 5.1 years ago • written 5.1 years ago by Istvan Albert ♦♦ 80k

shown above is the coverage of 700 bp region of my interest.. further out there is a steep decrease in coverage..

the primers were flanking the region ~10nts outside and ~10 nts inside the target region as shown below. enter image description here

ADD REPLYlink modified 5.1 years ago • written 5.1 years ago by rohan100

is it possible that you are sequencing the primers there? Basically primer + illumina adaptor

ADD REPLYlink written 5.1 years ago by Istvan Albert ♦♦ 80k

the target region was gel purified after pcr so this possibility is less likely.. i identified mutants in those reads.. so i think they are not coming from primers or adapters

ADD REPLYlink written 5.1 years ago by rohan100
2

It is very easy to check your data for this. Count how many reads are primers followed by the illumina adapter. You should remove these reads.

ADD REPLYlink written 5.1 years ago by Istvan Albert ♦♦ 80k
2
gravatar for Andreas
5.1 years ago by
Andreas2.4k
Singapore
Andreas2.4k wrote:

Hi,

we see those peaks in ultra high coverage viral amplicon sequencing as well. You need to ignore the primer positions for a good number of reasons. One is, that you are interested in the amplified target region, but not the primer regions. The latter will by definition largely be identical with the used primers, but not necessarily with the target sequences (where the primers might have imperfectly bound at first). One can often detect false positive low frequency variants covering primer positions. Furthermore, the huge coverage bias might negatively affect downstream analysis (by the way: Picard's MarkDuplicates will likely not help here). The sharp coverage drop can be caused by your sequencing setup. For example, let's say this was a larger region and you fragmented before sequencing. While fragment ends would normally be equally distributed across the region, you will always see a fragment end at the primer start, which is where you then see the sharp drop/increase.

Just my two cents,

Andreas

ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by Andreas2.4k

thanks a lot for suggestion.. this is interesting.. so the shearing would make this uneven coverage at ends.. i am not a wet lab guy but i think ideally it should not.. because shearing is supposed to be random..

i am also suspecting whether there were any incomplete cycles in pre-amplification where this 200bp fragments could have made. but that is also not the case as the cycles were all with ample extension time (1 min for thermo phusion pol).

ADD REPLYlink written 5.1 years ago by rohan100

i found this discussion which is very similar to my case Samtools + Picard MarkDuplicates

which recommends removing the duplicates,but as you have quoted "by the way: Picard's MarkDuplicates will likely not help here " I wonder why that would happen?

ADD REPLYlink modified 5.1 years ago • written 5.1 years ago by rohan100

If you have high coverage data then MarkDuplicates will likely remove pretty much all of your reads, because they all look identical. This can also skew SNV frequencies in downstream analysis.

ADD REPLYlink written 4.9 years ago by Andreas2.4k
1
gravatar for seidel
5.1 years ago by
seidel6.8k
United States
seidel6.8k wrote:

there is a huge difference in the calculations...

The calculations of what? Are you trying to identify the mutants? Or quantify them? (different questions, unless you're trying to do both). I think Istvan is right (and as you describe), you have sequence from the primers, which are there at higher concentration (by sequence) than the insert fragment. If you know what they are, why not trim them off? I can't really see a reason not to.

ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by seidel6.8k

by calculations, i meant the normalization of the frequency of mutants in the pool..

normalized frequency of mutant=absolute frequency / coverage at that position

this is where the coverage makes bias..

i will remove these reads, i will still get enough depth to work with.

thanks for the suggestion.

ADD REPLYlink written 5.1 years ago by rohan100
0
gravatar for swbarnes2
4.9 years ago by
swbarnes25.6k
United States
swbarnes25.6k wrote:
"two distinct peaks in coverage at the ends"

That's totally normal.  Every single molecule of your PCR product, after all, has a nice neat end right there already.  You can order special PCR primers with "blockers" to curtail that behavior.  You will also observe very few reads aligning quite close to the edge, apparently, the shearing happens very rarely, say, 20 bases away from the end of the ampilcon.

Remember that the sequence under the primers will literally be primer, if there is a mutation under the primer, you will never see it.  So you don't need to worry about calling SNPs in those bases.

ADD COMMENTlink modified 4.9 years ago • written 4.9 years ago by swbarnes25.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 725 users visited in the last hour