Trying to trim last bp of several samples with BBduk at once
1
1
Entering edit mode
2.6 years ago
julia.mars • 0

Hi,

I am trying to use BBduk to trim back my 151bp sequences to 150bp. I tried to create a loop for this so I could do one entire pool at the time, but I do not get any output files. Or at least not where I thought I would get them. It is probably not a difficult problem, but Bioinformatics are quite new for me and I haven't found a solution yet. (Btw, I work via an ssh connection on a university linux system). Can you help me? This is the script I used so far:

#!/bin/bash

#SBATCH --ntasks=1
#SBATCH --partition=cpu-long
#SBATCH --output=output_%j.txt
#SBATCH --error=error_output_%j.txt
#SBATCH --job-name=trim_adapters_Pool1A
#SBATCH --mail-type=ALL
#SBATCH --mail-user=xxxxxx
#SBATCH --mem=100GB

#Trimmin 150bp Illumina reads back to 150bp

# Change to directory where bbduk.sh is
cd /home/s1718118/BBTools/bbmap/

#Run bbduk for al files
for filename in /data/projects/pi-vissermcde/Julia/Pool_1A/*.fastq.gz;
do bash bbduk.sh in=$filename out=/data/s1718118/BBdukoutput/Pool1A/ ftm=5;
done

And this is what i get in the error output file:

java -ea -Xmx42802m -Xms42802m -cp /home/s1718118/BBTools/bbmap/current/ jgi.BBDuk in=/data/projects/pi-vissermcde/Julia/Pool_1A/Li_1177_Julia-Pool-1_74322-8_GACCAAGTTAAATATGCCAG_L001_R1_001_AHG3JLDRXY.filt.fastq.gz out=/data/s1718118/BBdukoutput/Pool1A/ ftm=5
Executing jgi.BBDuk [in=/data/projects/pi-vissermcde/Julia/Pool_1A/Li_1177_Julia-Pool-1_74322-8_GACCAAGTTAAATATGCCAG_L001_R1_001_AHG3JLDRXY.filt.fastq.gz, out=/data/s1718118/BBdukoutput/Pool1A/, ftm=5]
Version 38.92

Unspecified format for output /data/s1718118/BBdukoutput/Pool1A/; defaulting to fastq.
0.028 seconds.
Initial:
Memory: max=44895m, total=44895m, free=44857m, used=38m

Input is being processed as unpaired
Exception in thread "main" java.lang.RuntimeException: java.io.FileNotFoundException: /data/s1718118/BBdukoutput/Pool1A (Is a directory)
    at fileIO.ReadWrite.getRawOutputStream(ReadWrite.java:437)
    at fileIO.ReadWrite.getOutputStream(ReadWrite.java:402)
    at fileIO.ReadWrite.getOutputStream(ReadWrite.java:344)
    at stream.ReadStreamWriter.<init>(ReadStreamWriter.java:71)
    at stream.ReadStreamByteWriter.<init>(ReadStreamByteWriter.java:18)
    at stream.ConcurrentGenericReadOutputStream.<init>(ConcurrentGenericReadOutputStream.java:38)
    at stream.ConcurrentReadOutputStream.getStream(ConcurrentReadOutputStream.java:71)
    at stream.ConcurrentReadOutputStream.getStream(ConcurrentReadOutputStream.java:35)
    at jgi.BBDuk.spawnProcessThreads(BBDuk.java:1920)
    at jgi.BBDuk.process2(BBDuk.java:1186)
    at jgi.BBDuk.process(BBDuk.java:1082)
    at jgi.BBDuk.main(BBDuk.java:81)
Caused by: java.io.FileNotFoundException: /data/s1718118/BBdukoutput/Pool1A (Is a directory)
    at java.base/java.io.FileOutputStream.open0(Native Method)
    at java.base/java.io.FileOutputStream.open(FileOutputStream.java:298)
    at java.base/java.io.FileOutputStream.<init>(FileOutputStream.java:237)
    at java.base/java.io.FileOutputStream.<init>(FileOutputStream.java:158)
    at fileIO.ReadWrite.getRawOutputStream(ReadWrite.java:435)
    ... 11 more

Thankyou in advance!

Note: I redacted the email address in script

parameters BBduk loop bash • 904 views
ADD COMMENT
1
Entering edit mode
2.6 years ago
GenoMax 141k

That is because you are not providing unique output file name for the bbduk output. For each input sequence file you need a corresponding output file. You can do something like this

for filename in /data/projects/pi-vissermcde/Julia/Pool_1A/*.fastq.gz;
do 
# extract name of the sample
name=$(basename ${filename} .fastq.gz)
# use the name of the sample to create a unique output file name
bbduk.sh -Xmx4g in=$filename out=/data/s1718118/BBdukoutput/Pool1A/${name}_trim.fastq.gz ftm=5;
done

Since you are doing these conversions serially via one for loop there is no need to ask for 100GB of RAM (--mem=100GB). BBduk is memory efficient and only needs 2-4G of RAM.

ADD COMMENT
0
Entering edit mode

Thankyou! I thought it was something like that, but did not know how to do that yet. I will try this out. What does the -Xmx4g mean?

ADD REPLY
1
Entering edit mode

-Xmx4g means use max 4G of RAM for the java command.

ADD REPLY

Login before adding your answer.

Traffic: 1990 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6