Question: How to remove duplicates in a sorted bam file using picard?
gravatar for GK1610
4.6 years ago by
United States
GK161090 wrote:

I have a sorted paired end bam file

I want to remove all paired end duplicates

-------->                <----------
-------->                <----------

NOT this kind

-------->                              <----------
-------->                <----------

here is what i did

java -jar -Xmx16g ~/picard/1.68/bin/picard-tools-1.68/MarkDuplicates.jar I=test.sorted.bam O=test.remove.duplicates.bam M=~/test.DupMetrics.txt REMOVE_DUPLICATES=true VALIDATION_STRINGENCY=LENIENT

I am getting this error

INFO    2016-07-24 17:31:43 MarkDuplicates  Start of doWork freeMemory: 2014511032; totalMemory: 2025979904; maxMemory: 15271002112
INFO    2016-07-24 17:31:43 MarkDuplicates  Reading input file and constructing read end information.
INFO    2016-07-24 17:31:43 MarkDuplicates  Will retain up to 60599214 data points before spilling to disk.
INFO    2016-07-24 17:31:53 MarkDuplicates  Read 1000000 records. Tracking 7879 as yet unmatched pairs. 752 records in RAM.  Last sequence index: 0
INFO    2016-07-24 17:32:01 MarkDuplicates  Read 2000000 records. Tracking 16466 as yet unmatched pairs. 1245 records in RAM.  Last sequence index: 0
INFO    2016-07-24 17:32:10 MarkDuplicates  Read 3000000 records. Tracking 24087 as yet unmatched pairs. 1610 records in RAM.  Last sequence index: 0
INFO    2016-07-24 17:32:16 MarkDuplicates  Read 4000000 records. Tracking 32294 as yet unmatched pairs. 1716 records in RAM.  Last sequence index: 0
INFO    2016-07-24 17:32:24 MarkDuplicates  Read 5000000 records. Tracking 38993 as yet unmatched pairs. 1676 records in RAM.  Last sequence index: 0
[Sun Jul 24 17:32:25 EDT 2016] net.sf.picard.sam.MarkDuplicates done. Elapsed time: 0.70 minutes.
Exception in thread "main" net.sf.picard.PicardException: Value was put into PairInfoMap more than once.  1: HBCC_ACC_382_C:K00225:15:H3VK7BBXX:1:1123:15524:31804
    at net.sf.picard.sam.CoordinateSortedPairInfoMap.ensureSequenceLoaded(
    at net.sf.picard.sam.CoordinateSortedPairInfoMap.remove(
    at net.sf.picard.sam.DiskReadEndsMap.remove(
    at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(
    at net.sf.picard.sam.MarkDuplicates.doWork(
    at net.sf.picard.cmdline.CommandLineProgram.instanceMain(
    at net.sf.picard.sam.MarkDuplicates.main(
chip-seq • 2.6k views
ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by GK161090

It works.

Thanks :)

ADD REPLYlink written 4.6 years ago by GK161090


I suffered same error wuth you, Can you tell me how to figure out the "Exception in thread "main" htsjdk.samtools.SAMException: Value was put into PairInfoMap more than once."

Thanks a lot!

ADD REPLYlink written 4.3 years ago by yihenghu0
gravatar for Pierre Lindenbaum
4.6 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum134k wrote:

you version of picard is just too old. was released 4 years ago .

current version is 2.5

ADD COMMENTlink written 4.6 years ago by Pierre Lindenbaum134k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1643 users visited in the last hour