Question: Markduplicates Creating A Loss Of Mate Pair
gravatar for DG
6.9 years ago by
DG7.1k wrote:

This is the first time this problem has ever happened for me, after dozens of times of running my exome sequencing mapping, post-processing, and variant calling pipeline. The BAM file that is my output from MarkDuplicates on this single sample is missing mate pairs for many reads apparently. Here is the command line I am using for MarkDuplicates:

java -Xmx4g -jar MarkDuplicates.jar CREATE_INDEX=true INPUT=input.bam OUTPUT=output.bam METRICS_FILE=metrics.txt REMOVE_DUPLICATES=true VALIDATION_STRINGENCY=Lenient

When I check the input BAM file with ValidateSameFile it checks out fine, the output however I get errors about mate not found for paired read. Any ideas? Anyone have this happen to them?

exome gatk ngs markduplicates • 3.7k views
ADD COMMENTlink modified 6.9 years ago by Pierre Lindenbaum130k • written 6.9 years ago by DG7.1k

This is the behavior I always had with MarkDuplicates when removing them. It generates orphan reads. These orphan reads are unmapped.

Actually, it happens when you have a pair where only one read is mapped. When this mapped read is tagged as a duplicate, MarkDuplicate leaves its unmapped mate in the BAM.

ADD REPLYlink modified 6.9 years ago • written 6.9 years ago by toni2.2k
gravatar for Pierre Lindenbaum
6.9 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum130k wrote:

Do not use REMOVE_DUPLICATES=true .

Furthermore, you should keep all your reads :-) Should I remove the unmapped reads from my BAM ?

ADD COMMENTlink written 6.9 years ago by Pierre Lindenbaum130k

So I've never run in to this being a problem before, leading me to suspect something funny may be going on with this particular sample. The problem only became apparent because IndelRealigner through an error about malformed read headers in the BAM file due to the missing mate pair which truncated my pipeline. And yes, I keep all my unmapped reads in my datasets. I'll try this and see if it clears up the problem, I suspect it will.

ADD REPLYlink written 6.9 years ago by DG7.1k

Hi, Dan

I have the same problem as you. Did removing REMOVE_DUPLICATES=true solve the missing pair problem? A follow-up with your problem will be appreciated.



ADD REPLYlink modified 9 months ago by RamRS30k • written 4.8 years ago by lilepisorus30

Yes it did. I thought I had originally accepted Pierre's answer as solving the issue. I have done so now.

ADD REPLYlink written 4.8 years ago by DG7.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1794 users visited in the last hour