how to remove duplicates with picard
2
0
Entering edit mode
3.2 years ago
pt.taklifi ▴ 60

Hello everyone I am trying to remove duplicates from a bam file using picard with the command below

java -jar picard.jar MarkDuplicates REMOVE_DUPLICATES=true I=hg38.r.bam O=hg38.dedup.bam M=metrices.txt

when I run this code I get this message

    21:48:20.762 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/media/kavousi/eaf2d15a-4cb1-4dee-ade8-6954bdc813e1/Taklifi/dbgap/fastq/SRR10984462/bam/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Sun Feb 28 21:48:20 IRST 2021] MarkDuplicates INPUT=[hg38.r.bam] OUTPUT=hg38.dedup.bam METRICS_FILE=metrices.txt REMOVE_DUPLICATES=true    MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true DUPLEX_UMI=false ADD_PG_TAG_TO_READS=true ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture="" of="" last="" three="" ':'="" separated="" fields="" as="" numeric="" values=""> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Sun Feb 28 21:48:20 IRST 2021] Executing as ptaklifi@ibb-server on Linux 5.4.0-45-generic amd64; OpenJDK 64-Bit Server VM 11.0.10+9-Ubuntu-0ubuntu1.18.04; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.25.0
INFO    2021-02-28 21:48:20 MarkDuplicates  Start of doWork freeMemory: 178254872; totalMemory: 184549376; maxMemory: 16777216000
INFO    2021-02-28 21:48:20 MarkDuplicates  Reading input file and constructing read end information.
INFO    2021-02-28 21:48:20 MarkDuplicates  Will retain up to 60787014 data points before spilling to disk.
[Sun Feb 28 21:48:20 IRST 2021] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=671088640
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMFormatException: Error parsing SAM header. @RG line missing SM tag. Line:
@RG ID:SRR10984462; File /media/kavousi/eaf2d15a-4cb1-4dee-ade8-6954bdc813e1/Taklifi/dbgap/fastq/SRR10984462/bam/hg38.r.bam; Line number 197
    at htsjdk.samtools.SAMTextHeaderCodec.reportErrorParsingLine(SAMTextHeaderCodec.java:258)
    at htsjdk.samtools.SAMTextHeaderCodec.access$200(SAMTextHeaderCodec.java:46)
    at htsjdk.samtools.SAMTextHeaderCodec$ParsedHeaderLine.requireTag(SAMTextHeaderCodec.java:358)
    at htsjdk.samtools.SAMTextHeaderCodec.parseRGLine(SAMTextHeaderCodec.java:168)
    at htsjdk.samtools.SAMTextHeaderCodec.decode(SAMTextHeaderCodec.java:110)
    at htsjdk.samtools.BAMFileReader.readHeader(BAMFileReader.java:704)
    at htsjdk.samtools.BAMFileReader.<init>(BAMFileReader.java:298)
    at htsjdk.samtools.BAMFileReader.<init>(BAMFileReader.java:176)
    at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:406)
    at picard.sam.markduplicates.util.AbstractMarkDuplicatesCommandLineProgram.openInputs(AbstractMarkDuplicatesCommandLineProgram.java:262)
    at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:508)
    at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:257)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)

I'm not sure if there is a problem with command syntax or input file , and how would I fix it

alignment deduplication • 7.5k views
ADD COMMENT
0
Entering edit mode

The error is written in the message itself:

Cannot read non-existent file: file:///media/kavousi/eaf2d15a-4cb1-4dee-ade8-6954bdc813e1/Taklifi/dbgap/fastq/SRR10984462/bam/hg38.rbam

ADD REPLY
0
Entering edit mode

thank you, I'm sorry I misspelled the input file. I fixed it and run the command again. I edited my post and put the error message, as you can see I get a new error

@RG ID:SRR10984462; File /media/kavousi/eaf2d15a-4cb1-4dee-ade8-6954bdc813e1/Taklifi/dbgap/fastq/SRR10984462/bam/hg38.r.bam; Line number 197
    at htsjdk.samtools.SAMTextHeaderCodec.reportErrorParsingLine(SAMTextHeaderCodec.java:258)
ADD REPLY
0
Entering edit mode
3.2 years ago
Mensur Dlakic ★ 27k

I don't think the command you gave us is right, as it doesn't match with the rest of output. For example, it says INPUT=[hg38.rbam] and that is not a match for your command line. Either way, the error is clear, and the file is not there:

Cannot read non-existent file: file:///media/kavousi/eaf2d15a-4cb1-4dee-ade8-6954bdc813e1/Taklifi/dbgap/fastq/SRR10984462/bam/hg38.rbam

It appears that you may be reading a file from a flash drive or something like that. It would be much faster to copy it to a disk, and then specify the file name correctly.

ADD COMMENT
1
Entering edit mode

rbam is quite an unusual suffix. @pt.taklifi can you confirm that the command you use is indeed the one you ran, or is this part of a script, and this rbam is some intermediate file that the pipeline or Picard itself produces. I am not a Picard user myself but afaik the tool takes two passes over the BAM file for the deduplication, so maybe this is indeed some intermediate file it writes which (for a reason to be found out) is not accessable, maybe related to the location of the input file on that /file/media flash?

ADD REPLY
0
Entering edit mode

thank you, I'm sorry I misspelled the input file. I fixed it and run the command again. I edited my post and put the error message, as you can see I get a new error

@RG ID:SRR10984462; File /media/kavousi/eaf2d15a-4cb1-4dee-ade8-6954bdc813e1/Taklifi/dbgap/fastq/SRR10984462/bam/hg38.r.bam; Line number 197
    at htsjdk.samtools.SAMTextHeaderCodec.reportErrorParsingLine(SAMTextHeaderCodec.java:258)
ADD REPLY

Login before adding your answer.

Traffic: 3198 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6