Question: Problems with picard markduplicates
gravatar for Cecelia
15 months ago by
Cecelia20 wrote:

Hi, I was running picard markduplicates with a few bam files.

java -jar /sw/bioinfo/picard/2.20.4/rackham/picard.jar MarkDuplicates INPUT=sorted.bam OUTPUT=md.bam METRICS_FILE=duplicate.txt READ_NAME_REGEX=null REMOVE_DUPLICATES=true CREATE_INDEX=true

And I got no output file and message like this:

INFO    2019-11-20 02:53:04 MarkDuplicates  Start of doWork freeMemory: 2037715440; totalMemory: 2058354688; maxMemory: 28631367680
INFO    2019-11-20 02:53:04 MarkDuplicates  Reading input file and constructing read end information.
INFO    2019-11-20 02:53:04 MarkDuplicates  Will retain up to 103736839 data points before spilling to disk.
[Wed Nov 20 02:53:05 CET 2019] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.02 minutes.
To get help, see
Exception in thread "main" htsjdk.samtools.SAMException: Sequence name 'scaffold1,8899378,f8056Z8899378' doesn't match regex: '[0-9A-Za-z!#$%&+./:;?@^_|~-][0-9A-Za-z!#$%&*+./:;=?@^_|~-]*' 
    at htsjdk.samtools.SAMSequenceRecord.validateSequenceName(
    at htsjdk.samtools.SAMSequenceRecord.<init>(
    at htsjdk.samtools.SAMTextHeaderCodec.parseSQLine(
    at htsjdk.samtools.SAMTextHeaderCodec.decode(
    at htsjdk.samtools.BAMFileReader.readHeader(
    at htsjdk.samtools.BAMFileReader.<init>(
    at htsjdk.samtools.BAMFileReader.<init>(
    at htsjdk.samtools.SamReaderFactory$
    at picard.sam.markduplicates.util.AbstractMarkDuplicatesCommandLineProgram.openInputs(
    at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(
    at picard.sam.markduplicates.MarkDuplicates.doWork(
    at picard.cmdline.CommandLineProgram.instanceMain(
    at picard.cmdline.PicardCommandLine.instanceMain(
    at picard.cmdline.PicardCommandLine.main(

The reference genome is the output from LINKS v1.8.7. The header of each contig should look like this: (scalfold,size,contig infomation)


If it is the problem with the header, how should I change it without losing information?

I read in this post that setting READ_NAME_REGEX=null could solve the problem but did not work in my case.

Any comments or suggestion will be appreciated.

ADD COMMENTlink modified 14 months ago by Biostar ♦♦ 20 • written 15 months ago by Cecelia20

Try adding a comma to the second part of the regex (full regex would be '[0-9A-Za-z!#$%&+./:;?@^_|~-][0-9A-Za-z!#$%&*+./:;=?@^_|~,-]*' and using that as the READ_NAME_REGEX

ADD REPLYlink written 15 months ago by Ram32k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1658 users visited in the last hour