many read aligned to the same position but with different variants
0
0
Entering edit mode
21 months ago
yliueagle ▴ 270

I have two questions related to the following alignment from a single sequencing sample of a cell line:

(1) are the reads in the bottom represent PCR duplicates, as they are aligned exactly to the same position (2) if they are duplicates, why there are so many different variants among them? (e.g,. at the position near 60795540

alignment duplicates reads • 508 views
0
Entering edit mode

are the reads in the bottom represent PCR duplicates, as they are aligned exactly to the same position

We don't see the full reads but there are too many differences in them just in this region to be PCR duplicates. You would normally have the same start/end with a defined number (small) of differences in them.

Run a tool like clumpify.sh if you really want to identify duplicates: A: Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files

0
Entering edit mode

Thanks for your answer. Here I updated the figure. These reads mapped exactly to the same region except that they have different variants, especially at the position near 60795540

0
Entering edit mode

These reads mapped exactly to the same region except that they have different variants

Then they don't quite fit the definition of PCR duplicates. Perhaps you are allowing too many errors when reads are originally aligned, which allows these reads to map here (even if they are not from this region). Is there any soft-clipping happening that we can't see in that image? If you want to identify PCR duplicates then use the clumpify method.