job output from bowtie2 quickly becomes gigantic
1
0
Entering edit mode
3.4 years ago
N.D.Hubot ▴ 10

Hi there,

I am trying to remove the reads from the human genome from some metagenomic samples. For that I am creating a for loop that gets through my samples and first align the reads to the reference genome, then send the output sam file to samtools to create a bam file and finally use that bam file to filter out the reads matching the human reference genome. That job is being sent on a cluster using SLURM. See below the script for the job:

#!/bin/bash
#
#SBATCH --job-name=host_removal
#SBATCH --output=remove_human.txt
#SBATCH --cpus-per-task=40
#SBATCH --ntasks=1
#SBATCH --nodes=1
#SBATCH --time=24:00:00
#SBATCH --mail-type=ALL
#SBATCH --mail-user=xxxxxx

cd /scratch/ndh1n17/trimmed

for i in *1.fq.gz
do
        echo $i
        prefix=$(basename $i 1.fq.gz)
        bowtie2 --threads 40 --quiet -x /scratch/ndh1n17/reference_genomes/human/GRCh38.fna -1 ${prefix}1.fq.gz -2 ${prefix}2.fq.gz \
         2> out.log | samtools view -bSh > ${prefix}.mapped.bam

        samtools fastq -f 4 -1 ${prefix}1.u.fastq -2 ${prefix}2.u.fastq ${prefix}.mapped.bam
done

As soon as I send this job, the output file (remove_human.txt) showing what gets printed out from the command gets filled with lines that look like the output for the sam file... I don't understand what is wrong in my script???!!!

Could someone help me, please? Thank you in advance for your help.

Nathan

bowtie2 alignement • 1.5k views
ADD COMMENT
0
Entering edit mode

I redacted the email address included in the post.

ADD REPLY
0
Entering edit mode

Since no one has commented I will take a crack. If you are capturing the log from bowtie2 using out.log then you should consider only capturing an error log from your SLURM job using -e error.log instead of the output. You may also be able to pipe the output of bowtie2 directly into samtools fastq avoiding the conversion to BAM. You may want to use .fastq.gz to keep the final output files compressed in that command.

ADD REPLY
0
Entering edit mode
3.4 years ago
ATpoint 89k

If it's indeed the SAM file ending up in the slurm log then the pipe is dysfunctional. This can be a newline error at the end of the first bowtie line or the stderr redirection doing some hickups here. Put together a minimal working example dataset with a few reads that completes within seconds and test outside of slurm until it works. I would start by removing that stderr redirection which anyway gets overwritten in every iteration.

ADD COMMENT
1
Entering edit mode

Thank you very much for your comments and suggestions. I have divided the script into separate commands and it worked fine. I don't know what happened but I will leave it there. Thanks again

ADD REPLY

Login before adding your answer.

Traffic: 3657 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6