Question: Markduplicates Creating A Loss Of Mate Pair
gravatar for Dan Gaston
6.1 years ago by
Dan Gaston7.1k
Dan Gaston7.1k wrote:

This is the first time this problem has ever happened for me, after dozens of times of running my exome sequencing mapping, post-processing, and variant calling pipeline. The BAM file that is my output from MarkDuplicates on this single sample is missing mate pairs for many reads apparently. Here is the command line I am using for MarkDuplicates:

java -Xmx4g -jar MarkDuplicates.jar CREATE_INDEX=true INPUT=input.bam OUTPUT=output.bam METRICS_FILE=metrics.txt REMOVE_DUPLICATES=true VALIDATION_STRINGENCY=Lenient

When I check the input BAM file with ValidateSameFile it checks out fine, the output however I get errors about mate not found for paired read. Any ideas? Anyone have this happen to them?

exome gatk ngs markduplicates • 3.5k views
ADD COMMENTlink modified 6.1 years ago by Pierre Lindenbaum124k • written 6.1 years ago by Dan Gaston7.1k

This is the behavior I always had with MarkDuplicates when removing them. It generates orphan reads. These orphan reads are unmapped.

Actually, it happens when you have a pair where only one read is mapped. When this mapped read is tagged as a duplicate, MarkDuplicate leaves its unmapped mate in the BAM.

ADD REPLYlink modified 6.1 years ago • written 6.1 years ago by toni2.1k
gravatar for Pierre Lindenbaum
6.1 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum124k wrote:

Do not use REMOVE_DUPLICATES=true .

Furthermore, you should keep all your reads :-) Should I remove the unmapped reads from my BAM ?

ADD COMMENTlink written 6.1 years ago by Pierre Lindenbaum124k

So I've never run in to this being a problem before, leading me to suspect something funny may be going on with this particular sample. The problem only became apparent because IndelRealigner through an error about malformed read headers in the BAM file due to the missing mate pair which truncated my pipeline. And yes, I keep all my unmapped reads in my datasets. I'll try this and see if it clears up the problem, I suspect it will.

ADD REPLYlink written 6.1 years ago by Dan Gaston7.1k

Hi, Dan

I have the same problem as you. Did removing "REMOVE_DUPLICATES=true" solve the missing pair problem? A follow-up with your problem will be appreciated. 




ADD REPLYlink written 4.0 years ago by lilepisorus30

Yes it did. I thought I had originally accepted Pierre's answer as solving the issue. I have done so now.

ADD REPLYlink written 4.0 years ago by Dan Gaston7.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1678 users visited in the last hour