Hi there,
I am trying to remove the reads from the human genome from some metagenomic samples. For that I am creating a for loop that gets through my samples and first align the reads to the reference genome, then send the output sam file to samtools to create a bam file and finally use that bam file to filter out the reads matching the human reference genome. That job is being sent on a cluster using SLURM. See below the script for the job:
#!/bin/bash
#
#SBATCH --job-name=host_removal
#SBATCH --output=remove_human.txt
#SBATCH --cpus-per-task=40
#SBATCH --ntasks=1
#SBATCH --nodes=1
#SBATCH --time=24:00:00
#SBATCH --mail-type=ALL
#SBATCH --mail-user=xxxxxx
cd /scratch/ndh1n17/trimmed
for i in *1.fq.gz
do
echo $i
prefix=$(basename $i 1.fq.gz)
bowtie2 --threads 40 --quiet -x /scratch/ndh1n17/reference_genomes/human/GRCh38.fna -1 ${prefix}1.fq.gz -2 ${prefix}2.fq.gz \
2> out.log | samtools view -bSh > ${prefix}.mapped.bam
samtools fastq -f 4 -1 ${prefix}1.u.fastq -2 ${prefix}2.u.fastq ${prefix}.mapped.bam
done
As soon as I send this job, the output file (remove_human.txt) showing what gets printed out from the command gets filled with lines that look like the output for the sam file... I don't understand what is wrong in my script???!!!
Could someone help me, please? Thank you in advance for your help.
Nathan
I redacted the email address included in the post.
Since no one has commented I will take a crack. If you are capturing the log from
bowtie2
usingout.log
then you should consider only capturing an error log from your SLURM job using-e error.log
instead of the output. You may also be able to pipe the output ofbowtie2
directly intosamtools fastq
avoiding the conversion to BAM. You may want to use.fastq.gz
to keep the final output files compressed in that command.