Question

Samtools Pileup Of Millions Of Reads From A Single Amplicon

1

Entering edit mode

10.2 years ago

Sebastian ▴ 10

Hi all,

We would like to pileup millions of reads from a single amplicon for ultra-sensitive mutation detection.

Considering that SAMtools pileup is limited to several thousand reads at a given position I am wondering if you could suggest us any alternative approach or workaround.

Any feedback is highly appreciated!

samtools pileup sequencing • 2.8k views

ADD COMMENT • link updated 10.2 years ago by Christian ★ 3.0k • written 10.2 years ago by Sebastian ▴ 10

1

Entering edit mode

If you sequence so deep, how do you make sure that not 99% of your reads are pcr duplicates?

ADD REPLY • link 10.2 years ago by Christian ★ 3.0k

0

Entering edit mode

In fact they are PCR duplicates, as the reads derive from amplicons. But that's not a problem, we want to detect single reads out of more than one million having a specific (known) mutation.

ADD REPLY • link 10.2 years ago by Sebastian ▴ 10

0

Entering edit mode

"But that's not a problem, ee want to detect single reads out of more than one million having a specific (known) mutation." So you just want to have a look at the cigar string of each read , isn't it ?

ADD REPLY • link 10.2 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

That is one additional approach we already thought about. But with millions of reads we also have to think about sequencing errors, both at the mutation and flanking sites. So our mutation may be expressed by different CIGAR strings. Further, e.g. any 4-base mutation at the same position will result in the same CIGAR string. As a third point, there may be the need to detect unknown mutations in a known hotspot in the future as well, that is why we need a flexible approach....

ADD REPLY • link 10.2 years ago by Sebastian ▴ 10

0

Entering edit mode

shuffle & downsample your bam ? or are you just searching for the reads having a SNP ?

ADD REPLY • link 10.2 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

yes, we want to detect those reads having a mutation

ADD REPLY • link 10.2 years ago by Sebastian ▴ 10

0

Entering edit mode

Cross-posted on Samtools mailing list http://sourceforge.net/mailarchive/forum.php?thread_name=20854588711E4A489A3AD70C9BA5548A01AE472348A7%40XCH11.scidom.de&forum_name=samtools-help

ADD REPLY • link 10.2 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

and crossposted on http://seqanswers.com/forums/showthread.php?t=41050

ADD REPLY • link 10.2 years ago by Pierre Lindenbaum 161k

score 0 · Answer 1 · 2014-02-20

0

Entering edit mode

10.2 years ago

Christian ★ 3.0k

Could you throw away all reads that match the reference at your hotspot and pileup the rest? This should massively decrease your read count.

ADD COMMENT • link 10.2 years ago by Christian ★ 3.0k

0

Entering edit mode

Unfortunately, this is only a possibility when I expect a really low frequency. Usually, I don't know the frequency of my sample and it might be upt o 50%. Furthermore, I would expect lots of wild-type reads that contain sequencing errors and thus also do not map the reference....

ADD REPLY • link 10.2 years ago by Sebastian ▴ 10

0

Entering edit mode

The first issue you could address with downsampling. The second one mitigated by checking base qualities.

ADD REPLY • link 10.2 years ago by Christian ★ 3.0k