Question: Problem with Picard MarkDuplicates
0
gravatar for Ming
4 months ago by
Ming40
Ming40 wrote:

Dear All,

I am trying to use the Picard MarkDuplicate tools, but came across the following error:

(base) tanshiming@S620100019205:~/Downloads/sorted-bam$ java -jar /home/tanshiming/tools/picard-2.21.1/picard.jar MarkDuplicates M=1B131118A.marked_dup_metrics.txt REMOVE_DUPLICATES=true I=1B131118B.sorted.bam  O=1B131118B.bwa.dedupe.bam
INFO    2019-10-15 16:43:25 MarkDuplicates  

********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
**********    MarkDuplicates -M 1B131118A.marked_dup_metrics.txt -REMOVE_DUPLICATES true -I 1B131118B.sorted.bam -O 1B131118B.bwa.dedupe.bam
**********


16:43:25.875 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/tanshiming/tools/picard-2.21.1/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Tue Oct 15 16:43:25 SGT 2019] MarkDuplicates INPUT=[1B131118B.sorted.bam] OUTPUT=1B131118B.bwa.dedupe.bam METRICS_FILE=1B131118A.marked_dup_metrics.txt REMOVE_DUPLICATES=true    MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true DUPLEX_UMI=false ADD_PG_TAG_TO_READS=true ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture="" of="" last="" three="" ':'="" separated="" fields="" as="" numeric="" values=""> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Tue Oct 15 16:43:25 SGT 2019] Executing as tanshiming@S620100019205 on Linux 4.15.0-58-generic amd64; Java HotSpot(TM) 64-Bit Server VM 12.0.2+10; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.21.1-SNAPSHOT
INFO    2019-10-15 16:43:25 MarkDuplicates  Start of doWork freeMemory: 158532400; totalMemory: 167772160; maxMemory: 32178700288
INFO    2019-10-15 16:43:25 MarkDuplicates  Reading input file and constructing read end information.
INFO    2019-10-15 16:43:25 MarkDuplicates  Will retain up to 116589493 data points before spilling to disk.
[Tue Oct 15 16:45:48 SGT 2019] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 2.38 minutes.
Runtime.totalMemory()=17448304640
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMFormatException: Number of sequences in text header (34216887) != number of sequences in binary header (182084669) for file /home/tanshiming/Downloads/sorted-bam/1B131118B.sorted.bam
    at htsjdk.samtools.BAMFileReader.readHeader(BAMFileReader.java:712)
    at htsjdk.samtools.BAMFileReader.<init>(BAMFileReader.java:298)
    at htsjdk.samtools.BAMFileReader.<init>(BAMFileReader.java:176)
    at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:396)
    at picard.sam.markduplicates.util.AbstractMarkDuplicatesCommandLineProgram.openInputs(AbstractMarkDuplicatesCommandLineProgram.java:220)
    at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:533)
    at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:257)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

Appreciate it if I could get any advice on this! Thank You.

mapping • 325 views
ADD COMMENTlink modified 4 months ago by prishly10 • written 4 months ago by Ming40
1

Hi Ming, It seems there is something wrong with your bam file header. Can you post the code on how you get the input bam file (from mapping output to your input)?

ADD REPLYlink written 4 months ago by yztxwd290

there is a difference between the number of chromosomes declared in the SAM header and it's BAM. How was built 1B131118B.sorted.bam ?

ADD REPLYlink written 4 months ago by Pierre Lindenbaum126k

Dear All,

Thanks for the reply. Below are the commands on how I got the bam file:

CONTIGS=contigs.fa 
REF=contigs-bowtie2-reference

# Build reference db 
bowtie2-build $CONTIGS $REF

# Mapping of reads 
for x in *_R1_001.fastq.gz
    do bowtie2 -p 50 -x $REF -1 $x -2 ${x%_R1_001*}_R2_001.fastq.gz -S ${x%_*_*_001*}.sam
done

# Convert sam to bam and sort
for x in *.sam
do samtools view -bS $x | samtools sort - -o ${x/.sam/.sorted.bam}
done

Thanks!

ADD REPLYlink written 4 months ago by Ming40

I haven't seen anything in your code could be the problem, have you compared the header of your sam file with that of your sorted bam file? Are they the same?

ADD REPLYlink written 4 months ago by yztxwd290

Hello @yztxwd,

How do I go about doing this?

Thanks

ADD REPLYlink written 4 months ago by Ming40

samtools -H can show you the header You can also follow the suggestions by @prishly, it implies you can use pybam to check the text header and binary header of bam file, sometimes simply reinstalling or updating your samtools can solve the problem

ADD REPLYlink modified 4 months ago • written 4 months ago by yztxwd290

Dear @yztxwd,

Should I just inspect the text header of the bam and the sam files? I am guessing the header should match? Will a conda-installed version of Samtools be fine?

Thank you.

ADD REPLYlink modified 4 months ago • written 4 months ago by Ming40

Dear All,

I have tried using the samtools version 1.9 (latest version), but I am still facing the same issue. Appreciate any help that I can get!

Thank you.

ADD REPLYlink written 4 months ago by Ming40

Likely not solving your problem, but the line below is more complicated than necessary:

samtools view -bS $x | samtools sort - -o ${x/.sam/.sorted.bam}

You could just use samtools sort directly:

samtools sort $x -o ${x/.sam/.sorted.bam}

Likewise, you could avoid the intermediate sam file and pipe directly to samtools sort to get a sorted bam:

bowtie2 -p 50 -x $REF -1 $x -2 ${x%_R1_001*}_R2_001.fastq.gz | samtools sort -o ${x%_*_*_001*}.bam
ADD REPLYlink modified 4 months ago • written 4 months ago by WouterDeCoster43k
1
gravatar for prishly
4 months ago by
prishly10
prishly10 wrote:

Take a look at this post. Gurus recommend updating samtools

ADD COMMENTlink written 4 months ago by prishly10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1165 users visited in the last hour