Question: Picard MarkDuplicates shows error (removing pcr duplicates)
0
gravatar for ashaneev07
3 months ago by
ashaneev070
ashaneev070 wrote:

Hiii.... i got the following error while running picards markduplicates. Does anyone have any experience with using this command in picard? Need help..

> java -jar picard.jar MarkDuplicates  I=300BP.sorted O=marked_duplicates_300.bam M=marked_dup_metrics.txt  REMOVE_DUPLICATES=true &

********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
**********    MarkDuplicates -I /300BP.sorted -O marked_duplicates_300.bam -M marked_dup_metrics.txt -REMOVE_DUPLICATES true
**********


15:13:11.216 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/home/Documents/Tools_NGS/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Wed Nov 21 15:13:11 IST 2018] MarkDuplicates INPUT=[300BP.sorted] OUTPUT=marked_duplicates_300.bam METRICS_FILE=marked_dup_metrics.txt REMOVE_DUPLICATES=true    MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true ADD_PG_TAG_TO_READS=true ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture="" of="" last="" three="" ':'="" separated="" fields="" as="" numeric="" values=""> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Wed Nov 21 15:13:11 IST 2018] Executing as home@home-Lenovo-H30-50 on Linux 4.4.0-31-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_171-b11; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.18.14-SNAPSHOT
INFO    2018-11-21 15:13:11 MarkDuplicates  Start of doWork freeMemory: 240890984; totalMemory: 251658240; maxMemory: 3720871936
INFO    2018-11-21 15:13:11 MarkDuplicates  Reading input file and constructing read end information.
INFO    2018-11-21 15:13:11 MarkDuplicates  Will retain up to 13481420 data points before spilling to disk.
[Wed Nov 21 15:13:13 IST 2018] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.04 minutes.
Runtime.totalMemory()=1302331392
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMException: /tmp/home/CSPI.8946166571745516868.tmp/20922.tmpnot found
    at htsjdk.samtools.util.FileAppendStreamLRUCache$Functor.makeValue(FileAppendStreamLRUCache.java:64)
    at htsjdk.samtools.util.FileAppendStreamLRUCache$Functor.makeValue(FileAppendStreamLRUCache.java:49)
    at htsjdk.samtools.util.ResourceLimitedMap.get(ResourceLimitedMap.java:76)
    at htsjdk.samtools.CoordinateSortedPairInfoMap.getOutputStreamForSequence(CoordinateSortedPairInfoMap.java:180)
    at htsjdk.samtools.CoordinateSortedPairInfoMap.put(CoordinateSortedPairInfoMap.java:164)
    at picard.sam.markduplicates.util.DiskBasedReadEndsForMarkDuplicatesMap.put(DiskBasedReadEndsForMarkDuplicatesMap.java:65)
    at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:543)
    at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:232)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:295)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
Caused by: java.io.FileNotFoundException: /tmp/home/CSPI.8946166571745516868.tmp/20922.tmp (Too many open files)
    at java.io.FileOutputStream.open0(Native Method)
    at java.io.FileOutputStream.open(FileOutputStream.java:270)
    at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
    at htsjdk.samtools.util.FileAppendStreamLRUCache$Functor.makeValue(FileAppendStreamLRUCache.java:61)
... 10 more
snp sequence alignment • 242 views
ADD COMMENTlink modified 3 months ago by Kevin Blighe37k • written 3 months ago by ashaneev070

can you try with

java -Djava.io.tmpdir=. -jar picard.jar -I 300BP.sorted (etc...)
ADD REPLYlink written 3 months ago by Pierre Lindenbaum117k

I tried with as u mentioned and now it shows like

I=300BP.sorted.bam' is not a valid command

ADD REPLYlink written 3 months ago by ashaneev070

sorry I forgot the main jar command after the jar...

java -Djava.io.tmpdir=. -jar picard.jar MarkDuplicates -I 300BP.sorted (etc...)
ADD REPLYlink written 3 months ago by Pierre Lindenbaum117k
NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
**********    MarkDuplicates -I 300BP.sorted -O marked_duplicates_300.bam -M marked_dup_metrics.txt -REMOVE_DUPLICATES true
**********


16:18:14.987 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/veena/Documents/Tools_NGS/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Wed Nov 21 16:18:15 IST 2018] MarkDuplicates INPUT=[300BP.sorted] OUTPUT=marked_duplicates_300.bam METRICS_FILE=marked_dup_metrics.txt REMOVE_DUPLICATES=true    MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true ADD_PG_TAG_TO_READS=true ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture="" of="" last="" three="" ':'="" separated="" fields="" as="" numeric="" values=""> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Wed Nov 21 16:18:15 IST 2018] Executing as veena@veena-Lenovo-H30-50 on Linux 4.4.0-31-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_171-b11; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.18.14-SNAPSHOT
[Wed Nov 21 16:18:15 IST 2018] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=251658240
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMException: Cannot read non-existent file: file:///home/veena/Documents/Tools_NGS/300BP.sorted
    at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:430)
    at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:417)
    at htsjdk.samtools.util.IOUtil.assertInputIsValid(IOUtil.java:393)
    at htsjdk.samtools.util.IOUtil.assertInputsAreValid(IOUtil.java:469)
    at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:224)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:295)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
ADD REPLYlink modified 3 months ago by finswimmer10k • written 3 months ago by ashaneev070

may be not the main problem but the extension of 300BP.sorted should be 'sam'.

ADD REPLYlink modified 3 months ago • written 3 months ago by Pierre Lindenbaum117k

what is the outpout of

file 300BP.sorted

?

ADD REPLYlink modified 3 months ago • written 3 months ago by Pierre Lindenbaum117k

300BP.sorted file is a sorted bam file.

ADD REPLYlink written 3 months ago by ashaneev070

300BP.sorted file is a sorted bam file.

this is not the output of the command 'file'

ADD REPLYlink written 3 months ago by Pierre Lindenbaum117k

Could u please explain the meaning of your previous statement more fully. i didn't get any output file from the above command.

ADD REPLYlink written 3 months ago by ashaneev070

file <filename> prints out information about the filetyp. For a valid bam file you should get the following message in your terminal:

$ file input.bam
input.bam: gzip compressed data, extra field

fin swimmer

ADD REPLYlink written 3 months ago by finswimmer10k

ya..i got it.

$ file 300BP.sorted.bam 300BP.sorted.bam: gzip compressed data, extra field

ADD REPLYlink modified 3 months ago • written 3 months ago by ashaneev070
Caused by: java.io.FileNotFoundException: /tmp/home/CSPI.8946166571745516868.tmp/20922.tmp (Too many open files)

This looks like MarkDuplicates needs to create many temporary files and also needs to keep them open. Most distribution have a limit of 1024 by default. You can check this with ulimit -n. For the current shell you can set it to higher number by e.g. ulimit -n 2048.

fin swimmer

ADD REPLYlink written 3 months ago by finswimmer10k

Got like this...

ulimit -n 1024

ulimit -n 2048

bash: ulimit: open files: cannot modify limit: Operation not permitted

ADD REPLYlink modified 3 months ago • written 3 months ago by ashaneev070
1

There seems to be a lot of reasons and solution why this message appears. As I don't know your system I would recommend searching the web for this error message, to find a solution to increase the limit on your system.

fin swimmer

ADD REPLYlink written 3 months ago by finswimmer10k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1359 users visited in the last hour