Question: Problem with Picard MarkDuplicates
0
gravatar for Ming
29 days ago by
Ming40
Ming40 wrote:

Dear All,

I am trying to use the Picard MarkDuplicate tools, but came across the following error:

(base) tanshiming@S620100019205:~/Downloads/sorted-bam$ java -jar /home/tanshiming/tools/picard-2.21.1/picard.jar MarkDuplicates M=1B131118A.marked_dup_metrics.txt REMOVE_DUPLICATES=true I=1B131118B.sorted.bam  O=1B131118B.bwa.dedupe.bam
INFO    2019-10-15 16:43:25 MarkDuplicates  

********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
**********    MarkDuplicates -M 1B131118A.marked_dup_metrics.txt -REMOVE_DUPLICATES true -I 1B131118B.sorted.bam -O 1B131118B.bwa.dedupe.bam
**********


16:43:25.875 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/tanshiming/tools/picard-2.21.1/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Tue Oct 15 16:43:25 SGT 2019] MarkDuplicates INPUT=[1B131118B.sorted.bam] OUTPUT=1B131118B.bwa.dedupe.bam METRICS_FILE=1B131118A.marked_dup_metrics.txt REMOVE_DUPLICATES=true    MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true DUPLEX_UMI=false ADD_PG_TAG_TO_READS=true ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture="" of="" last="" three="" ':'="" separated="" fields="" as="" numeric="" values=""> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Tue Oct 15 16:43:25 SGT 2019] Executing as tanshiming@S620100019205 on Linux 4.15.0-58-generic amd64; Java HotSpot(TM) 64-Bit Server VM 12.0.2+10; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.21.1-SNAPSHOT
INFO    2019-10-15 16:43:25 MarkDuplicates  Start of doWork freeMemory: 158532400; totalMemory: 167772160; maxMemory: 32178700288
INFO    2019-10-15 16:43:25 MarkDuplicates  Reading input file and constructing read end information.
INFO    2019-10-15 16:43:25 MarkDuplicates  Will retain up to 116589493 data points before spilling to disk.
[Tue Oct 15 16:45:48 SGT 2019] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 2.38 minutes.
Runtime.totalMemory()=17448304640
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMFormatException: Number of sequences in text header (34216887) != number of sequences in binary header (182084669) for file /home/tanshiming/Downloads/sorted-bam/1B131118B.sorted.bam
    at htsjdk.samtools.BAMFileReader.readHeader(BAMFileReader.java:712)
    at htsjdk.samtools.BAMFileReader.<init>(BAMFileReader.java:298)
    at htsjdk.samtools.BAMFileReader.<init>(BAMFileReader.java:176)
    at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:396)
    at picard.sam.markduplicates.util.AbstractMarkDuplicatesCommandLineProgram.openInputs(AbstractMarkDuplicatesCommandLineProgram.java:220)
    at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:533)
    at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:257)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

Appreciate it if I could get any advice on this! Thank You.

mapping • 187 views
ADD COMMENTlink modified 29 days ago by prishly10 • written 29 days ago by Ming40
1

Hi Ming, It seems there is something wrong with your bam file header. Can you post the code on how you get the input bam file (from mapping output to your input)?

ADD REPLYlink written 29 days ago by yztxwd170

there is a difference between the number of chromosomes declared in the SAM header and it's BAM. How was built 1B131118B.sorted.bam ?

ADD REPLYlink written 29 days ago by Pierre Lindenbaum124k

Dear All,

Thanks for the reply. Below are the commands on how I got the bam file:

CONTIGS=contigs.fa 
REF=contigs-bowtie2-reference

# Build reference db 
bowtie2-build $CONTIGS $REF

# Mapping of reads 
for x in *_R1_001.fastq.gz
    do bowtie2 -p 50 -x $REF -1 $x -2 ${x%_R1_001*}_R2_001.fastq.gz -S ${x%_*_*_001*}.sam
done

# Convert sam to bam and sort
for x in *.sam
do samtools view -bS $x | samtools sort - -o ${x/.sam/.sorted.bam}
done

Thanks!

ADD REPLYlink written 29 days ago by Ming40

I haven't seen anything in your code could be the problem, have you compared the header of your sam file with that of your sorted bam file? Are they the same?

ADD REPLYlink written 29 days ago by yztxwd170

Hello @yztxwd,

How do I go about doing this?

Thanks

ADD REPLYlink written 28 days ago by Ming40

samtools -H can show you the header You can also follow the suggestions by @prishly, it implies you can use pybam to check the text header and binary header of bam file, sometimes simply reinstalling or updating your samtools can solve the problem

ADD REPLYlink modified 28 days ago • written 28 days ago by yztxwd170

Dear @yztxwd,

Should I just inspect the text header of the bam and the sam files? I am guessing the header should match? Will a conda-installed version of Samtools be fine?

Thank you.

ADD REPLYlink modified 27 days ago • written 27 days ago by Ming40

Dear All,

I have tried using the samtools version 1.9 (latest version), but I am still facing the same issue. Appreciate any help that I can get!

Thank you.

ADD REPLYlink written 23 days ago by Ming40

Likely not solving your problem, but the line below is more complicated than necessary:

samtools view -bS $x | samtools sort - -o ${x/.sam/.sorted.bam}

You could just use samtools sort directly:

samtools sort $x -o ${x/.sam/.sorted.bam}

Likewise, you could avoid the intermediate sam file and pipe directly to samtools sort to get a sorted bam:

bowtie2 -p 50 -x $REF -1 $x -2 ${x%_R1_001*}_R2_001.fastq.gz | samtools sort -o ${x%_*_*_001*}.bam
ADD REPLYlink modified 23 days ago • written 23 days ago by WouterDeCoster42k
1
gravatar for prishly
29 days ago by
prishly10
prishly10 wrote:

Take a look at this post. Gurus recommend updating samtools

ADD COMMENTlink written 29 days ago by prishly10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1838 users visited in the last hour