Question

Trying to trim last bp of several samples with BBduk at once

1

Entering edit mode

2.6 years ago

julia.mars • 0

Hi,

I am trying to use BBduk to trim back my 151bp sequences to 150bp. I tried to create a loop for this so I could do one entire pool at the time, but I do not get any output files. Or at least not where I thought I would get them. It is probably not a difficult problem, but Bioinformatics are quite new for me and I haven't found a solution yet. (Btw, I work via an ssh connection on a university linux system). Can you help me? This is the script I used so far:

#!/bin/bash

#SBATCH --ntasks=1
#SBATCH --partition=cpu-long
#SBATCH --output=output_%j.txt
#SBATCH --error=error_output_%j.txt
#SBATCH --job-name=trim_adapters_Pool1A
#SBATCH --mail-type=ALL
#SBATCH --mail-user=xxxxxx
#SBATCH --mem=100GB

#Trimmin 150bp Illumina reads back to 150bp

# Change to directory where bbduk.sh is
cd /home/s1718118/BBTools/bbmap/

#Run bbduk for al files
for filename in /data/projects/pi-vissermcde/Julia/Pool_1A/*.fastq.gz;
do bash bbduk.sh in=$filename out=/data/s1718118/BBdukoutput/Pool1A/ ftm=5;
done

And this is what i get in the error output file:

java -ea -Xmx42802m -Xms42802m -cp /home/s1718118/BBTools/bbmap/current/ jgi.BBDuk in=/data/projects/pi-vissermcde/Julia/Pool_1A/Li_1177_Julia-Pool-1_74322-8_GACCAAGTTAAATATGCCAG_L001_R1_001_AHG3JLDRXY.filt.fastq.gz out=/data/s1718118/BBdukoutput/Pool1A/ ftm=5
Executing jgi.BBDuk [in=/data/projects/pi-vissermcde/Julia/Pool_1A/Li_1177_Julia-Pool-1_74322-8_GACCAAGTTAAATATGCCAG_L001_R1_001_AHG3JLDRXY.filt.fastq.gz, out=/data/s1718118/BBdukoutput/Pool1A/, ftm=5]
Version 38.92

Unspecified format for output /data/s1718118/BBdukoutput/Pool1A/; defaulting to fastq.
0.028 seconds.
Initial:
Memory: max=44895m, total=44895m, free=44857m, used=38m

Input is being processed as unpaired
Exception in thread "main" java.lang.RuntimeException: java.io.FileNotFoundException: /data/s1718118/BBdukoutput/Pool1A (Is a directory)
    at fileIO.ReadWrite.getRawOutputStream(ReadWrite.java:437)
    at fileIO.ReadWrite.getOutputStream(ReadWrite.java:402)
    at fileIO.ReadWrite.getOutputStream(ReadWrite.java:344)
    at stream.ReadStreamWriter.<init>(ReadStreamWriter.java:71)
    at stream.ReadStreamByteWriter.<init>(ReadStreamByteWriter.java:18)
    at stream.ConcurrentGenericReadOutputStream.<init>(ConcurrentGenericReadOutputStream.java:38)
    at stream.ConcurrentReadOutputStream.getStream(ConcurrentReadOutputStream.java:71)
    at stream.ConcurrentReadOutputStream.getStream(ConcurrentReadOutputStream.java:35)
    at jgi.BBDuk.spawnProcessThreads(BBDuk.java:1920)
    at jgi.BBDuk.process2(BBDuk.java:1186)
    at jgi.BBDuk.process(BBDuk.java:1082)
    at jgi.BBDuk.main(BBDuk.java:81)
Caused by: java.io.FileNotFoundException: /data/s1718118/BBdukoutput/Pool1A (Is a directory)
    at java.base/java.io.FileOutputStream.open0(Native Method)
    at java.base/java.io.FileOutputStream.open(FileOutputStream.java:298)
    at java.base/java.io.FileOutputStream.<init>(FileOutputStream.java:237)
    at java.base/java.io.FileOutputStream.<init>(FileOutputStream.java:158)
    at fileIO.ReadWrite.getRawOutputStream(ReadWrite.java:435)
    ... 11 more

Thankyou in advance!

Note: I redacted the email address in script

parameters BBduk loop bash • 904 views

ADD COMMENT • link updated 2.6 years ago by GenoMax 141k • written 2.6 years ago by julia.mars • 0

score 1 · Answer 1 · 2021-09-22

1

Entering edit mode

2.6 years ago

GenoMax 141k

That is because you are not providing unique output file name for the bbduk output. For each input sequence file you need a corresponding output file. You can do something like this

for filename in /data/projects/pi-vissermcde/Julia/Pool_1A/*.fastq.gz;
do 
# extract name of the sample
name=$(basename ${filename} .fastq.gz)
# use the name of the sample to create a unique output file name
bbduk.sh -Xmx4g in=$filename out=/data/s1718118/BBdukoutput/Pool1A/${name}_trim.fastq.gz ftm=5;
done

Since you are doing these conversions serially via one for loop there is no need to ask for 100GB of RAM (--mem=100GB). BBduk is memory efficient and only needs 2-4G of RAM.

ADD COMMENT • link 2.6 years ago by GenoMax 141k

0

Entering edit mode

Thankyou! I thought it was something like that, but did not know how to do that yet. I will try this out. What does the -Xmx4g mean?

ADD REPLY • link 2.6 years ago by julia.mars • 0

1

Entering edit mode

-Xmx4g means use max 4G of RAM for the java command.

ADD REPLY • link 2.6 years ago by GenoMax 141k