Question

Assigning variables programmatically for bwa-mem

1

Entering edit mode

2.5 years ago

Joy P ▴ 10

I have the following script:

bwa mem -t 10 -R “@RG\tID:xxx\tSM:xxxx\tLB:LB-1\tPU:xxx\tPL:ILLUMINA” ref_genome.fa sample_1_1.fastq sample_1_2.fastq | samtools view -@ 10 -b - | samtools s sort -@ 10 -o sample_1.bam

I also have a spreadsheet with a column for the forward reads (sample 1, sample 2, sample 3 etc), reverse reads and each of the read group variables. Each row contains all the information for one sample

How can I assign the values of ID, SM and PU, fastq file names and bam file names programmatically from my spreadsheet and run the samples in parallel so that I don't have to input them all manually and can make the most of my computing resources?

I'm using bash script and I'm fairly new to coding.

Thanks!

bwa-mem • 551 views

ADD COMMENT • link updated 2.5 years ago by Istvan Albert 100k • written 2.5 years ago by Joy P ▴ 10

score 2 · Answer 1 · 2021-11-03

use GNU parallel.

Suppose the file ids.csv contains:

A,X,1
B,Y,2
C,Z,3

then using parallel you could write:

cat ids.csv | parallel --colsep=',' echo First={1}, Second={2}, Third={3}

it prints:

First=A, Second=X, Third=1
First=B, Second=Y, Third=2
First=C, Second=Z, Third=3

Look for tutorials like this:

Gnu Parallel - Parallelize Serial Command Line Programs Without Changing Them