Generating shell scripts
2
1
Entering edit mode
8.5 years ago
ebrown1955 ▴ 320

I have 1000 samples that I need to run through a certain pipeline. Are there any programs that can generate shell scripts for each sample?

For example, if I have a command to run BWA on a sample, e.g.

bwa mem -R "@RG\tID:[sample]\tPL:ILLUMINA\tLB:lib1" $REF [sample]_1.fastq [sample]_2.fastq > [sample].sam

How can I generate this script for 1000 samples (replacing "[sample]" which respective sample ID). Are there any tools that do this sort of batch processing? I've heard of people using make, but I'm unsure of what they mean.

I know I can write a loop that processes each file separately, however I'd like to submit the jobs separately as there is a time limit on each job I can submit to my cluster.

shell bash • 3.3k views
ADD COMMENT
7
Entering edit mode
8.5 years ago

yes you can generate this using make. See the example below:

This makefile will generate the following statements:

using make with option -j will parallelize things. See also : https://github.com/lindenb/ngsxml

ADD COMMENT
3
Entering edit mode
8.5 years ago

Depending on your sample naming scheme, perhaps use a for loop with seq:

#!/bin/bash

for idx in `seq 1 1000`
do
    SAMPLE_ID="sample_${idx}"
    bwa mem -R "@RG\tID:#${SAMPLE_ID}\tPL:ILLUMINA\tLB:lib1" $REF ${SAMPLE_ID}_1.fastq ${SAMPLE_ID}_2.fastq > ${SAMPLE_ID}.sam
done

If it isn't clear what this does, you might first run this to see how it works:

#!/bin/bash

for idx in `seq 1 1000`
do
    SAMPLE_ID="sample_${idx}"
    echo ${SAMPLE_ID}
done

If you are submitting jobs to a cluster (say, with qsub) then you just qsub within the for loop:

#!/bin/bash

for idx in `seq 1 1000`
do
    SAMPLE_ID="sample_${idx}"
    qsub /* ...options... */ bwa mem -R "@RG\tID:#${SAMPLE_ID}\tPL:ILLUMINA\tLB:lib1" $REF ${SAMPLE_ID}_1.fastq ${SAMPLE_ID}_2.fastq > ${SAMPLE_ID}.sam
done

To be a good neighbor to your fellow cluster users, you will want to debug and understand how this script will run, before you submit 1000 jobs to your cluster.

ADD COMMENT
0
Entering edit mode

Thanks for your help. I understand how to do this in a loop, but our cluster does not allow us to submit commands through qsub. Instead we may submit shell scripts. I'm trying to generate these files in a quick way. Is there a way to make the loop do this?

ADD REPLY
0
Entering edit mode

You might look into generating a template script file, and then using sed to replace a placeholder keyword in the template with your sample ID value.

ADD REPLY
4
Entering edit mode

Thank you, this is exactly what I ended up doing! For anyone interested, here is what I did:

samplenames=`cat samplenames_unique.txt`

for i in $samplenames
do
    sed s/XXXX/"$i"/g template_file.sh > $i.sh
done

"XXXX" was the placeholder text for the sample name in the template file I created.

ADD REPLY

Login before adding your answer.

Traffic: 1478 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6