Trying to trim last bp of several samples with BBduk at once
Entering edit mode
4 weeks ago
julia.mars • 0


I am trying to use BBduk to trim back my 151bp sequences to 150bp. I tried to create a loop for this so I could do one entire pool at the time, but I do not get any output files. Or at least not where I thought I would get them. It is probably not a difficult problem, but Bioinformatics are quite new for me and I haven't found a solution yet. (Btw, I work via an ssh connection on a university linux system). Can you help me? This is the script I used so far:


#SBATCH --ntasks=1
#SBATCH --partition=cpu-long
#SBATCH --output=output_%j.txt
#SBATCH --error=error_output_%j.txt
#SBATCH --job-name=trim_adapters_Pool1A
#SBATCH --mail-type=ALL
#SBATCH --mail-user=xxxxxx
#SBATCH --mem=100GB

#Trimmin 150bp Illumina reads back to 150bp

# Change to directory where is
cd /home/s1718118/BBTools/bbmap/

#Run bbduk for al files
for filename in /data/projects/pi-vissermcde/Julia/Pool_1A/*.fastq.gz;
do bash in=$filename out=/data/s1718118/BBdukoutput/Pool1A/ ftm=5;

And this is what i get in the error output file:

java -ea -Xmx42802m -Xms42802m -cp /home/s1718118/BBTools/bbmap/current/ jgi.BBDuk in=/data/projects/pi-vissermcde/Julia/Pool_1A/Li_1177_Julia-Pool-1_74322-8_GACCAAGTTAAATATGCCAG_L001_R1_001_AHG3JLDRXY.filt.fastq.gz out=/data/s1718118/BBdukoutput/Pool1A/ ftm=5
Executing jgi.BBDuk [in=/data/projects/pi-vissermcde/Julia/Pool_1A/Li_1177_Julia-Pool-1_74322-8_GACCAAGTTAAATATGCCAG_L001_R1_001_AHG3JLDRXY.filt.fastq.gz, out=/data/s1718118/BBdukoutput/Pool1A/, ftm=5]
Version 38.92

Unspecified format for output /data/s1718118/BBdukoutput/Pool1A/; defaulting to fastq.
0.028 seconds.
Memory: max=44895m, total=44895m, free=44857m, used=38m

Input is being processed as unpaired
Exception in thread "main" java.lang.RuntimeException: /data/s1718118/BBdukoutput/Pool1A (Is a directory)
    at fileIO.ReadWrite.getRawOutputStream(
    at fileIO.ReadWrite.getOutputStream(
    at fileIO.ReadWrite.getOutputStream(
    at stream.ReadStreamWriter.<init>(
    at stream.ReadStreamByteWriter.<init>(
    at stream.ConcurrentGenericReadOutputStream.<init>(
    at stream.ConcurrentReadOutputStream.getStream(
    at stream.ConcurrentReadOutputStream.getStream(
    at jgi.BBDuk.spawnProcessThreads(
    at jgi.BBDuk.process2(
    at jgi.BBDuk.process(
    at jgi.BBDuk.main(
Caused by: /data/s1718118/BBdukoutput/Pool1A (Is a directory)
    at java.base/ Method)
    at java.base/
    at java.base/<init>(
    at java.base/<init>(
    at fileIO.ReadWrite.getRawOutputStream(
    ... 11 more

Thankyou in advance!

Note: I redacted the email address in script

parameters BBduk loop bash • 186 views
Entering edit mode
4 weeks ago
GenoMax 108k

That is because you are not providing unique output file name for the bbduk output. For each input sequence file you need a corresponding output file. You can do something like this

for filename in /data/projects/pi-vissermcde/Julia/Pool_1A/*.fastq.gz;
# extract name of the sample
name=$(basename ${filename} .fastq.gz)
# use the name of the sample to create a unique output file name -Xmx4g in=$filename out=/data/s1718118/BBdukoutput/Pool1A/${name}_trim.fastq.gz ftm=5;

Since you are doing these conversions serially via one for loop there is no need to ask for 100GB of RAM (--mem=100GB). BBduk is memory efficient and only needs 2-4G of RAM.

Entering edit mode

Thankyou! I thought it was something like that, but did not know how to do that yet. I will try this out. What does the -Xmx4g mean?

Entering edit mode

-Xmx4g means use max 4G of RAM for the java command.


Login before adding your answer.

Traffic: 2450 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6