Samtools Pileup Of Millions Of Reads From A Single Amplicon
1
1
Entering edit mode
10.2 years ago
Sebastian ▴ 10

Hi all,

We would like to pileup millions of reads from a single amplicon for ultra-sensitive mutation detection.

Considering that SAMtools pileup is limited to several thousand reads at a given position I am wondering if you could suggest us any alternative approach or workaround.

Any feedback is highly appreciated!

samtools pileup sequencing • 2.8k views
ADD COMMENT
1
Entering edit mode

If you sequence so deep, how do you make sure that not 99% of your reads are pcr duplicates?

ADD REPLY
0
Entering edit mode

In fact they are PCR duplicates, as the reads derive from amplicons. But that's not a problem, we want to detect single reads out of more than one million having a specific (known) mutation.

ADD REPLY
0
Entering edit mode

"But that's not a problem, ee want to detect single reads out of more than one million having a specific (known) mutation." So you just want to have a look at the cigar string of each read , isn't it ?

ADD REPLY
0
Entering edit mode

That is one additional approach we already thought about. But with millions of reads we also have to think about sequencing errors, both at the mutation and flanking sites. So our mutation may be expressed by different CIGAR strings. Further, e.g. any 4-base mutation at the same position will result in the same CIGAR string. As a third point, there may be the need to detect unknown mutations in a known hotspot in the future as well, that is why we need a flexible approach....

ADD REPLY
0
Entering edit mode

shuffle & downsample your bam ? or are you just searching for the reads having a SNP ?

ADD REPLY
0
Entering edit mode

yes, we want to detect those reads having a mutation

ADD REPLY
0
Entering edit mode
10.2 years ago
Christian ★ 3.0k

Could you throw away all reads that match the reference at your hotspot and pileup the rest? This should massively decrease your read count.

ADD COMMENT
0
Entering edit mode

Unfortunately, this is only a possibility when I expect a really low frequency. Usually, I don't know the frequency of my sample and it might be upt o 50%. Furthermore, I would expect lots of wild-type reads that contain sequencing errors and thus also do not map the reference....

ADD REPLY
0
Entering edit mode

The first issue you could address with downsampling. The second one mitigated by checking base qualities.

ADD REPLY

Login before adding your answer.

Traffic: 2649 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6