Question: using bash variable in picard MarkDuplicates command
0
gravatar for bioguy24
11 months ago by
bioguy24190
Chicago
bioguy24190 wrote:

In the below bash the two variables $bname and $sample are extracted correctly however when I pass them to the java command $bname is set to file:///home/cmccabe/NA12878.bam and an exception is thrown in the picard command. I can only guess that $sample is set to ///home/cmccabe/NA12878. I am not sure what I am doing wrong, can a bash variable not be used in java or is there something else? Thank you :)

input

/home/cmccabe/Desktop/fastq/NA12878.bam

result of echo

The bam is NA12878.bam   --- this is $bname
The matching sample is NA12878   --- this is $sample

bash

for file in /home/cmccabe/Desktop/fastq/*.bam
do
bname=`basename $file`
echo "The bam file is:" $bname
    sample=$(basename $file .bam | cut -d- -f1)
echo "The matching sample is:"$sample
    java -XX:ParallelGCThreads=16 -jar /home/cmccabe/Desktop/fastq/picard/build/libs/picard.jar MarkDuplicates \
      I=$bname \
      O=${sample}_marked_duplicates.bam \
      M=${sample}_marked_dup_metrics.txt
done
picard ngs • 503 views
ADD COMMENTlink modified 11 months ago by h.mon26k • written 11 months ago by bioguy24190
2

Can you paste the error thrown by Picard?

ADD REPLYlink written 11 months ago by James Ashmore2.6k

Sorry, I highlighted the variables and error. Thank you :).

The bam file is: NA12878.bam
The matching sample is:NA12878
12:53:28.480 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/cmccabe/Desktop/fastq/picard/build/libs/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Thu Jul 26 12:53:28 CDT 2018] MarkDuplicates INPUT=[NA12878.bam] OUTPUT=NA12878_marked_duplicates.bam METRICS_FILE=NA12878_marked_dup_metrics.txt    MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true ADD_PG_TAG_TO_READS=true REMOVE_DUPLICATES=false ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture="" of="" last="" three="" ':'="" separated="" fields="" as="" numeric="" values=""> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Thu Jul 26 12:53:28 CDT 2018] Executing as cmccabe@DTV-A5211QLM on Linux 4.4.0-131-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.18.9-9-g8e25161-SNAPSHOT
[Thu Jul 26 12:53:28 CDT 2018] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=2024275968
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMException: Cannot read non-existent file: file:///home/cmccabe/NA12878.bam
    at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:430)
    at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:417)
    at htsjdk.samtools.util.IOUtil.assertInputIsValid(IOUtil.java:393)
    at htsjdk.samtools.util.IOUtil.assertInputsAreValid(IOUtil.java:469)
    at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:224)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:282)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
ADD REPLYlink modified 11 months ago by h.mon26k • written 11 months ago by bioguy24190
2

Word to the wise, you should quote your variables too: "${var}"

To stop nasty unexpected parameter expansions.

ADD REPLYlink modified 11 months ago • written 11 months ago by jrj.healey12k
1

try replacing $bname with $file or ${file} @ cmccabe

ADD REPLYlink written 11 months ago by cpad011211k
1
gravatar for Dan D
11 months ago by
Dan D6.8k
Tennessee
Dan D6.8k wrote:

The problem is not totally with your usage of variables. You're pointing Picard to the wrong location for the BAM (from the Picard output you pasted in the comment):

Exception in thread "main" htsjdk.samtools.SAMException: Cannot read non-existent file: file:///home/cmccabe/NA12878.bam

In your picard command, change this:

I=$bname

to this:

I=/home/cmccabe/Desktop/fastq/$bname

And that should work for you.

ADD COMMENTlink modified 11 months ago • written 11 months ago by Dan D6.8k
0
gravatar for h.mon
11 months ago by
h.mon26k
Brazil
h.mon26k wrote:

$bname is not being set to file:///home/cmccabe/NA12878.bam, your output shows it clearly:

The bam file is: NA12878.bam

And

INPUT=[NA12878.bam] OUTPUT=NA12878_marked_duplicates.bam

The problem is where you are running the script: it will only work if you run on the same folder the am files are located. What you want is:

      I=$file \

Or cd into the folder where the bams are located.

ADD COMMENTlink modified 11 months ago • written 11 months ago by h.mon26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1649 users visited in the last hour