Question

Generating shell scripts

1

Entering edit mode

8.5 years ago

ebrown1955 ▴ 320

I have 1000 samples that I need to run through a certain pipeline. Are there any programs that can generate shell scripts for each sample?

For example, if I have a command to run BWA on a sample, e.g.

bwa mem -R "@RG\tID:[sample]\tPL:ILLUMINA\tLB:lib1" $REF [sample]_1.fastq [sample]_2.fastq > [sample].sam

How can I generate this script for 1000 samples (replacing "[sample]" which respective sample ID). Are there any tools that do this sort of batch processing? I've heard of people using make, but I'm unsure of what they mean.

I know I can write a loop that processes each file separately, however I'd like to submit the jobs separately as there is a time limit on each job I can submit to my cluster.

shell bash • 3.3k views

ADD COMMENT • link updated 8.5 years ago by Pierre Lindenbaum 161k • written 8.5 years ago by ebrown1955 ▴ 320

Ram · Answer 1 · 2015-10-19

7

Entering edit mode

8.5 years ago

Pierre Lindenbaum 161k

yes you can generate this using make. See the example below:

This makefile will generate the following statements:

using make with option -j will parallelize things. See also : https://github.com/lindenb/ngsxml

ADD COMMENT • link updated 4.4 years ago by Ram 43k • written 8.5 years ago by Pierre Lindenbaum 161k

Ram · Answer 2 · 2015-10-19

3

Entering edit mode

8.5 years ago

Alex Reynolds 35k

Depending on your sample naming scheme, perhaps use a for loop with seq:

#!/bin/bash

for idx in `seq 1 1000`
do
    SAMPLE_ID="sample_${idx}"
    bwa mem -R "@RG\tID:#${SAMPLE_ID}\tPL:ILLUMINA\tLB:lib1" $REF ${SAMPLE_ID}_1.fastq ${SAMPLE_ID}_2.fastq > ${SAMPLE_ID}.sam
done

If it isn't clear what this does, you might first run this to see how it works:

#!/bin/bash

for idx in `seq 1 1000`
do
    SAMPLE_ID="sample_${idx}"
    echo ${SAMPLE_ID}
done

If you are submitting jobs to a cluster (say, with qsub) then you just qsub within the for loop:

#!/bin/bash

for idx in `seq 1 1000`
do
    SAMPLE_ID="sample_${idx}"
    qsub /* ...options... */ bwa mem -R "@RG\tID:#${SAMPLE_ID}\tPL:ILLUMINA\tLB:lib1" $REF ${SAMPLE_ID}_1.fastq ${SAMPLE_ID}_2.fastq > ${SAMPLE_ID}.sam
done

To be a good neighbor to your fellow cluster users, you will want to debug and understand how this script will run, before you submit 1000 jobs to your cluster.

ADD COMMENT • link updated 4.4 years ago by Ram 43k • written 8.5 years ago by Alex Reynolds 35k

0

Entering edit mode

Thanks for your help. I understand how to do this in a loop, but our cluster does not allow us to submit commands through qsub. Instead we may submit shell scripts. I'm trying to generate these files in a quick way. Is there a way to make the loop do this?

ADD REPLY • link 8.5 years ago by ebrown1955 ▴ 320

0

Entering edit mode

You might look into generating a template script file, and then using sed to replace a placeholder keyword in the template with your sample ID value.

ADD REPLY • link 8.5 years ago by Alex Reynolds 35k

4

Entering edit mode

Thank you, this is exactly what I ended up doing! For anyone interested, here is what I did:

samplenames=`cat samplenames_unique.txt`

for i in $samplenames
do
    sed s/XXXX/"$i"/g template_file.sh > $i.sh
done

"XXXX" was the placeholder text for the sample name in the template file I created.

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.5 years ago by ebrown1955 ▴ 320