I have the following script:
bwa mem -t 10 -R “@RG\tID:xxx\tSM:xxxx\tLB:LB-1\tPU:xxx\tPL:ILLUMINA” ref_genome.fa sample_1_1.fastq sample_1_2.fastq | samtools view -@ 10 -b - | samtools s sort -@ 10 -o sample_1.bam
I also have a spreadsheet with a column for the forward reads (sample 1, sample 2, sample 3 etc), reverse reads and each of the read group variables. Each row contains all the information for one sample
How can I assign the values of ID, SM and PU, fastq file names and bam file names programmatically from my spreadsheet and run the samples in parallel so that I don't have to input them all manually and can make the most of my computing resources?
I'm using bash script and I'm fairly new to coding.
Thanks!