Issue with Duplicates reads in Exom Data
Entering edit mode
3.8 years ago

I have WES data and I have aligned it to hg19 using bwa mem.

In BAM qc using qalimap, I am getting 26% duplicates reads in the BAM file. I used Picard with REMOVE_DUPLICATES=true option to remove the duplicates and I ran again Qualimap on the picard output where % of duplicates reads showing was 18%.

On the other hand I used samtools view -F 1024 file.bam | wc -l to count the duplicates reads but output is similar to samtools view file.bam |wc -l where it shows the total no of reads with zero duplicates.

Can we remove PCR duplicates completely?

Thanks in advance.

sequencing alignment duplicates BAM • 999 views
Entering edit mode
3.8 years ago

I don't know exactly how Qualimap measure duplication, but a quick look at the documentation strongly suggets to me that it measures the single-end duplication rate, where as Picard will remove reads on the basis of the paired-end duplication rate. In single-end duplication, for a read to be a duplicate, it need only share its alignment start position with another read, where as for pair-end duplication a read must share both its start postion, and the position of its mate with another read in order to be called a duplicate.

Where you have paired-end data you almost always care about paired duplications, not single-end duplicates.


Login before adding your answer.

Traffic: 2528 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6