Question: Problems with picard markduplicates
gravatar for Cecelia
9 weeks ago by
Cecelia20 wrote:

Hi, I was running picard markduplicates with a few bam files.

java -jar /sw/bioinfo/picard/2.20.4/rackham/picard.jar MarkDuplicates INPUT=sorted.bam OUTPUT=md.bam METRICS_FILE=duplicate.txt READ_NAME_REGEX=null REMOVE_DUPLICATES=true CREATE_INDEX=true

And I got no output file and message like this:

INFO    2019-11-20 02:53:04 MarkDuplicates  Start of doWork freeMemory: 2037715440; totalMemory: 2058354688; maxMemory: 28631367680
INFO    2019-11-20 02:53:04 MarkDuplicates  Reading input file and constructing read end information.
INFO    2019-11-20 02:53:04 MarkDuplicates  Will retain up to 103736839 data points before spilling to disk.
[Wed Nov 20 02:53:05 CET 2019] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.02 minutes.
To get help, see
Exception in thread "main" htsjdk.samtools.SAMException: Sequence name 'scaffold1,8899378,f8056Z8899378' doesn't match regex: '[0-9A-Za-z!#$%&+./:;?@^_|~-][0-9A-Za-z!#$%&*+./:;=?@^_|~-]*' 
    at htsjdk.samtools.SAMSequenceRecord.validateSequenceName(
    at htsjdk.samtools.SAMSequenceRecord.<init>(
    at htsjdk.samtools.SAMTextHeaderCodec.parseSQLine(
    at htsjdk.samtools.SAMTextHeaderCodec.decode(
    at htsjdk.samtools.BAMFileReader.readHeader(
    at htsjdk.samtools.BAMFileReader.<init>(
    at htsjdk.samtools.BAMFileReader.<init>(
    at htsjdk.samtools.SamReaderFactory$
    at picard.sam.markduplicates.util.AbstractMarkDuplicatesCommandLineProgram.openInputs(
    at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(
    at picard.sam.markduplicates.MarkDuplicates.doWork(
    at picard.cmdline.CommandLineProgram.instanceMain(
    at picard.cmdline.PicardCommandLine.instanceMain(
    at picard.cmdline.PicardCommandLine.main(

The reference genome is the output from LINKS v1.8.7. The header of each contig should look like this: (scalfold,size,contig infomation)


If it is the problem with the header, how should I change it without losing information?

I read in this post that setting READ_NAME_REGEX=null could solve the problem but did not work in my case.

Any comments or suggestion will be appreciated.

ADD COMMENTlink modified 7 weeks ago by Biostar ♦♦ 20 • written 9 weeks ago by Cecelia20

Try adding a comma to the second part of the regex (full regex would be '[0-9A-Za-z!#$%&+./:;?@^_|~-][0-9A-Za-z!#$%&*+./:;=?@^_|~,-]*' and using that as the READ_NAME_REGEX

ADD REPLYlink written 9 weeks ago by RamRS25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 641 users visited in the last hour