Hi Biostars,
I am trying to write a bash script that takes every fastq R1 & R2 pair in my cleaned reads directory and aligns them to my reference with bwa mem, coverts SAM to BAM with samtools view, and sorts BAM for SNP calling with samtools sort. Note that I am working with pretty limited storage so I am really trying to avoid build-up of the intermediate SAM & BAM files.
Here is my script right now:
#!/bin/bash
#index reference genome
bwa mem index .../reference_genome.fasta
for file in .../cleaned_reads/*R1.fastq.gz;
do
name = $(basename "$file" .R1.fastq.gz)
echo "Assembling plastome for $name"
forward=$name".R1.fastq.gz"
reverse=$name".R2.fastq.gz"
bam=.../sorted_bam_files/"$name".bam
bwa mem -t 6 .../reference_genome.fasta .../cleaned_reads/"$forward" ..../cleaned reads/"$reverse" |
samtools view -b -h | samtools sort --threads 6 -o "$bam";
done
The current script seems to work with no problems for the bwa step but then doesn't move on to the next step and produces no output. Any advice on this would be appreciated! I am also pretty new to bash scripting so apologies in advance if it's a very obvious issue.
Thanks! Lee
You do not need the
samtools view
. Try the following (note the-
at the end of the command line, it is required).after you learned simple loops, use a Makefile and parallelize with
make -j 123
, and then later learn how to use nextflow or snakemake