hello humans,
I am struggling with a bash script that should actually work as far as I can see. I need to extract prefixes of 262 files in a directory that contains reads. I will map them for later variance calling etc. However, whilst trouble shooting because the mapping always stuck, I found that actually not all prefixes got extracted properly. When I print the output to a .txt file I see that just like 200 sth are properly extracted and some with mistakes as well. And I don't know why... I think the problem might be this prefix extraction step, but maybe also my mapping line is not correct. maybe someone here knows how to fix this.
So my input files look like this. 262 files, with the naming convention:
AH-00001_S117_L001_R1_P_trimmed.001.fastq
AH-00001_S117_L001_R2_P_trimmed.001.fastq
KewS01_S470_L001_R1_P_trimmed.001.fastq
KewS01_S470_L001_R2_P_trimmed.001.fastq
(...)
My script looks like this :
#!/bin/bash
(..)
#SBATCH --array=1-262
filenameFWD=$(ls -1 *_L001_R1_P_trimmed.001.fastq | tail -n +${SLURM_ARRAY_TASK_ID} | head -1)
filenameREV=$(ls -1 *_L001_R2_P_trimmed.001.fastq | tail -n +${SLURM_ARRAY_TASK_ID} | head -1)
prefixFWD=$(echo $filenameFWD | sed 's/_L001_R1_P_trimmed\.001\.fastq$//')
prefixREV=$(echo $filenameREV | sed 's/_L001_R2_P_trimmed\.001\.fastq$//')
module load NextGenMap
module load SAMtools/1.3.1
ngm -r Reference.fasta -1 ${prefixFWD}L001_R1_P_trimmed.001.fastq -2 ${prefixREV}L001_R1_P_trimmed.001.fastq -t 12 -o Test1.sam
samtools view -bS Test1.sam | samtools sort -o Test1.bam
samtools index Test1.bam
why doesn't it work properly? And also, would you recommend to use set -o errexit set -o nounset in the beginning of the script?
Thanks in advance!
whoa
you should use a workflow manager like snakemake or nextflow. Your futur self will thank you
what's wrong about that huh? :D
well, how do you manage things when 10 fastqs out of the 262 fastqs failed ?
what would you suggest ?