Issue with Duplicates reads in Exom Data
1
0
Entering edit mode
3.9 years ago

I have WES data and I have aligned it to hg19 using bwa mem.

In BAM qc using qalimap, I am getting 26% duplicates reads in the BAM file. I used Picard with REMOVE_DUPLICATES=true option to remove the duplicates and I ran again Qualimap on the picard output where % of duplicates reads showing was 18%.

On the other hand I used samtools view -F 1024 file.bam | wc -l to count the duplicates reads but output is similar to samtools view file.bam |wc -l where it shows the total no of reads with zero duplicates.

Can we remove PCR duplicates completely?

Thanks in advance.

sequencing alignment duplicates BAM • 1.0k views
ADD COMMENT
0
Entering edit mode
3.9 years ago

I don't know exactly how Qualimap measure duplication, but a quick look at the documentation strongly suggets to me that it measures the single-end duplication rate, where as Picard will remove reads on the basis of the paired-end duplication rate. In single-end duplication, for a read to be a duplicate, it need only share its alignment start position with another read, where as for pair-end duplication a read must share both its start postion, and the position of its mate with another read in order to be called a duplicate.

Where you have paired-end data you almost always care about paired duplications, not single-end duplicates.

ADD COMMENT

Login before adding your answer.

Traffic: 3277 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6