Question: Generating shell scripts
1
gravatar for ebrown1955
4.4 years ago by
ebrown1955300
United States
ebrown1955300 wrote:

I have 1000 samples that I need to run through a certain pipeline. Are there any programs that can generate shell scripts for each sample?

For example, if I have a command to run BWA on a sample, e.g.

bwa mem -R "@RG\tID:[sample]\tPL:ILLUMINA\tLB:lib1" $REF [sample]_1.fastq [sample]_2.fastq > [sample].sam

How can I generate this script for 1000 samples (replacing "[sample]" which respective sample ID). Are there any tools that do this sort of batch processing? I've heard of people using make, but I'm unsure of what they mean.

I know I can write a loop that processes each file separately, however I'd like to submit the jobs separately as there is a time limit on each job I can submit to my cluster.

bash shell • 2.0k views
ADD COMMENTlink modified 4.4 years ago by Pierre Lindenbaum126k • written 4.4 years ago by ebrown1955300
5
gravatar for Pierre Lindenbaum
4.4 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum126k wrote:

yes you can generate this using make. See the example below:

This makefile will generate the following statements:

using make with option -j will parallelize things. See also : https://github.com/lindenb/ngsxml

ADD COMMENTlink modified 3 months ago by RamRS25k • written 4.4 years ago by Pierre Lindenbaum126k
2
gravatar for Alex Reynolds
4.4 years ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

Depending on your sample naming scheme, perhaps use a for loop with seq:

#!/bin/bash

for idx in `seq 1 1000`
do
    SAMPLE_ID="sample_${idx}"
    bwa mem -R "@RG\tID:#${SAMPLE_ID}\tPL:ILLUMINA\tLB:lib1" $REF ${SAMPLE_ID}_1.fastq ${SAMPLE_ID}_2.fastq > ${SAMPLE_ID}.sam
done

If it isn't clear what this does, you might first run this to see how it works:

#!/bin/bash

for idx in `seq 1 1000`
do
    SAMPLE_ID="sample_${idx}"
    echo ${SAMPLE_ID}
done

If you are submitting jobs to a cluster (say, with qsub) then you just qsub within the for loop:

#!/bin/bash

for idx in `seq 1 1000`
do
    SAMPLE_ID="sample_${idx}"
    qsub /* ...options... */ bwa mem -R "@RG\tID:#${SAMPLE_ID}\tPL:ILLUMINA\tLB:lib1" $REF ${SAMPLE_ID}_1.fastq ${SAMPLE_ID}_2.fastq > ${SAMPLE_ID}.sam
done

To be a good neighbor to your fellow cluster users, you will want to debug and understand how this script will run, before you submit 1000 jobs to your cluster.

ADD COMMENTlink modified 3 months ago by RamRS25k • written 4.4 years ago by Alex Reynolds29k

Thanks for your help. I understand how to do this in a loop, but our cluster does not allow us to submit commands through qsub. Instead we may submit shell scripts. I'm trying to generate these files in a quick way. Is there a way to make the loop do this?

ADD REPLYlink written 4.3 years ago by ebrown1955300

You might look into generating a template script file, and then using sed to replace a placeholder keyword in the template with your sample ID value.

ADD REPLYlink written 4.3 years ago by Alex Reynolds29k
2

Thank you, this is exactly what I ended up doing! For anyone interested, here is what I did:

samplenames=`cat samplenames_unique.txt`

for i in $samplenames
do
    sed s/XXXX/"$i"/g template_file.sh > $i.sh
done

"XXXX" was the placeholder text for the sample name in the template file I created.

ADD REPLYlink modified 3 months ago by RamRS25k • written 4.3 years ago by ebrown1955300
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1471 users visited in the last hour