Trying to trim last bp of several samples with BBduk at once
Entering edit mode
2.8 years ago
julia.mars • 0


I am trying to use BBduk to trim back my 151bp sequences to 150bp. I tried to create a loop for this so I could do one entire pool at the time, but I do not get any output files. Or at least not where I thought I would get them. It is probably not a difficult problem, but Bioinformatics are quite new for me and I haven't found a solution yet. (Btw, I work via an ssh connection on a university linux system). Can you help me? This is the script I used so far:


#SBATCH --ntasks=1
#SBATCH --partition=cpu-long
#SBATCH --output=output_%j.txt
#SBATCH --error=error_output_%j.txt
#SBATCH --job-name=trim_adapters_Pool1A
#SBATCH --mail-type=ALL
#SBATCH --mail-user=xxxxxx
#SBATCH --mem=100GB

#Trimmin 150bp Illumina reads back to 150bp

# Change to directory where is
cd /home/s1718118/BBTools/bbmap/

#Run bbduk for al files
for filename in /data/projects/pi-vissermcde/Julia/Pool_1A/*.fastq.gz;
do bash in=$filename out=/data/s1718118/BBdukoutput/Pool1A/ ftm=5;

And this is what i get in the error output file:

java -ea -Xmx42802m -Xms42802m -cp /home/s1718118/BBTools/bbmap/current/ jgi.BBDuk in=/data/projects/pi-vissermcde/Julia/Pool_1A/Li_1177_Julia-Pool-1_74322-8_GACCAAGTTAAATATGCCAG_L001_R1_001_AHG3JLDRXY.filt.fastq.gz out=/data/s1718118/BBdukoutput/Pool1A/ ftm=5
Executing jgi.BBDuk [in=/data/projects/pi-vissermcde/Julia/Pool_1A/Li_1177_Julia-Pool-1_74322-8_GACCAAGTTAAATATGCCAG_L001_R1_001_AHG3JLDRXY.filt.fastq.gz, out=/data/s1718118/BBdukoutput/Pool1A/, ftm=5]
Version 38.92

Unspecified format for output /data/s1718118/BBdukoutput/Pool1A/; defaulting to fastq.
0.028 seconds.
Memory: max=44895m, total=44895m, free=44857m, used=38m

Input is being processed as unpaired
Exception in thread "main" java.lang.RuntimeException: /data/s1718118/BBdukoutput/Pool1A (Is a directory)
    at fileIO.ReadWrite.getRawOutputStream(
    at fileIO.ReadWrite.getOutputStream(
    at fileIO.ReadWrite.getOutputStream(
    at stream.ReadStreamWriter.<init>(
    at stream.ReadStreamByteWriter.<init>(
    at stream.ConcurrentGenericReadOutputStream.<init>(
    at stream.ConcurrentReadOutputStream.getStream(
    at stream.ConcurrentReadOutputStream.getStream(
    at jgi.BBDuk.spawnProcessThreads(
    at jgi.BBDuk.process2(
    at jgi.BBDuk.process(
    at jgi.BBDuk.main(
Caused by: /data/s1718118/BBdukoutput/Pool1A (Is a directory)
    at java.base/ Method)
    at java.base/
    at java.base/<init>(
    at java.base/<init>(
    at fileIO.ReadWrite.getRawOutputStream(
    ... 11 more

Thankyou in advance!

Note: I redacted the email address in script

parameters BBduk loop bash • 988 views
Entering edit mode
2.8 years ago
GenoMax 144k

That is because you are not providing unique output file name for the bbduk output. For each input sequence file you need a corresponding output file. You can do something like this

for filename in /data/projects/pi-vissermcde/Julia/Pool_1A/*.fastq.gz;
# extract name of the sample
name=$(basename ${filename} .fastq.gz)
# use the name of the sample to create a unique output file name -Xmx4g in=$filename out=/data/s1718118/BBdukoutput/Pool1A/${name}_trim.fastq.gz ftm=5;

Since you are doing these conversions serially via one for loop there is no need to ask for 100GB of RAM (--mem=100GB). BBduk is memory efficient and only needs 2-4G of RAM.

Entering edit mode

Thankyou! I thought it was something like that, but did not know how to do that yet. I will try this out. What does the -Xmx4g mean?

Entering edit mode

-Xmx4g means use max 4G of RAM for the java command.


Login before adding your answer.

Traffic: 3565 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6