many read aligned to the same position but with different variants
0
0
Entering edit mode
3.5 years ago
yliueagle ▴ 290

(Please open this link if image not displayed: https://ibb.co/mHYH8NC

I have two questions related to the following alignment from a single sequencing sample of a cell line:

(1) are the reads in the bottom represent PCR duplicates, as they are aligned exactly to the same position (2) if they are duplicates, why there are so many different variants among them? (e.g,. at the position near 60795540

Thanks for your answer!

enter image description here

alignment duplicates reads • 775 views
ADD COMMENT
0
Entering edit mode

are the reads in the bottom represent PCR duplicates, as they are aligned exactly to the same position

We don't see the full reads but there are too many differences in them just in this region to be PCR duplicates. You would normally have the same start/end with a defined number (small) of differences in them.

Run a tool like clumpify.sh if you really want to identify duplicates: A: Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files

ADD REPLY
0
Entering edit mode

Thanks for your answer. Here I updated the figure. These reads mapped exactly to the same region except that they have different variants, especially at the position near 60795540

ADD REPLY
0
Entering edit mode

These reads mapped exactly to the same region except that they have different variants

Then they don't quite fit the definition of PCR duplicates. Perhaps you are allowing too many errors when reads are originally aligned, which allows these reads to map here (even if they are not from this region). Is there any soft-clipping happening that we can't see in that image? If you want to identify PCR duplicates then use the clumpify method.

ADD REPLY

Login before adding your answer.

Traffic: 1325 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6