Generating shell scripts
2
1
Entering edit mode
9.6 years ago
ebrown1955 ▴ 320

I have 1000 samples that I need to run through a certain pipeline. Are there any programs that can generate shell scripts for each sample?

For example, if I have a command to run BWA on a sample, e.g.

bwa mem -R "@RG\tID:[sample]\tPL:ILLUMINA\tLB:lib1" $REF [sample]_1.fastq [sample]_2.fastq > [sample].sam

How can I generate this script for 1000 samples (replacing "[sample]" which respective sample ID). Are there any tools that do this sort of batch processing? I've heard of people using make, but I'm unsure of what they mean.

I know I can write a loop that processes each file separately, however I'd like to submit the jobs separately as there is a time limit on each job I can submit to my cluster.

shell bash • 3.8k views
ADD COMMENT
7
Entering edit mode
9.6 years ago

yes you can generate this using make. See the example below:

SAMPLES=$(addprefix SAMPLE,1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50)
REF=ref.fa
define make_bam
(addsuffix.bam,$(1)):{REF}.bwt (addsuffix1.fastq.gz,$(1))(addsuffix _2.fastq.gz,$(1))
bwa mem -R '@RG\tID:$(1)\tSM:$(1)' ${REF} (filter^) |\
samtools view -Sbu - |\
samtools sort - (basename@) && \
samtools index $$@
endef
result.vcf : ${REF}.fai $(addsuffix .bam,${SAMPLES})
samtools mpileup -uSD -f ${REF} $(filter %.bam,$^) | bcftools view -vcg - > $@
$(eval $(foreach S,${SAMPLES},$(call make_bam,${S})))
${REF}.bwt: ${REF}
bwa index $<
${REF}.fai: ${REF}
samtools faidx $<
view raw Makefile hosted with ❤ by GitHub

This makefile will generate the following statements:

$ make -n
samtools faidx ref.fa
bwa index ref.fa
bwa mem -R '@RG\tID:SAMPLE1\tSM:SAMPLE1' ref.fa SAMPLE1_1.fastq.gz SAMPLE1_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE1 && samtools index SAMPLE1.bam
bwa mem -R '@RG\tID:SAMPLE2\tSM:SAMPLE2' ref.fa SAMPLE2_1.fastq.gz SAMPLE2_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE2 && samtools index SAMPLE2.bam
bwa mem -R '@RG\tID:SAMPLE3\tSM:SAMPLE3' ref.fa SAMPLE3_1.fastq.gz SAMPLE3_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE3 && samtools index SAMPLE3.bam
bwa mem -R '@RG\tID:SAMPLE4\tSM:SAMPLE4' ref.fa SAMPLE4_1.fastq.gz SAMPLE4_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE4 && samtools index SAMPLE4.bam
bwa mem -R '@RG\tID:SAMPLE5\tSM:SAMPLE5' ref.fa SAMPLE5_1.fastq.gz SAMPLE5_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE5 && samtools index SAMPLE5.bam
bwa mem -R '@RG\tID:SAMPLE6\tSM:SAMPLE6' ref.fa SAMPLE6_1.fastq.gz SAMPLE6_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE6 && samtools index SAMPLE6.bam
bwa mem -R '@RG\tID:SAMPLE7\tSM:SAMPLE7' ref.fa SAMPLE7_1.fastq.gz SAMPLE7_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE7 && samtools index SAMPLE7.bam
bwa mem -R '@RG\tID:SAMPLE8\tSM:SAMPLE8' ref.fa SAMPLE8_1.fastq.gz SAMPLE8_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE8 && samtools index SAMPLE8.bam
bwa mem -R '@RG\tID:SAMPLE9\tSM:SAMPLE9' ref.fa SAMPLE9_1.fastq.gz SAMPLE9_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE9 && samtools index SAMPLE9.bam
bwa mem -R '@RG\tID:SAMPLE10\tSM:SAMPLE10' ref.fa SAMPLE10_1.fastq.gz SAMPLE10_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE10 && samtools index SAMPLE10.bam
bwa mem -R '@RG\tID:SAMPLE11\tSM:SAMPLE11' ref.fa SAMPLE11_1.fastq.gz SAMPLE11_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE11 && samtools index SAMPLE11.bam
bwa mem -R '@RG\tID:SAMPLE12\tSM:SAMPLE12' ref.fa SAMPLE12_1.fastq.gz SAMPLE12_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE12 && samtools index SAMPLE12.bam
bwa mem -R '@RG\tID:SAMPLE13\tSM:SAMPLE13' ref.fa SAMPLE13_1.fastq.gz SAMPLE13_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE13 && samtools index SAMPLE13.bam
bwa mem -R '@RG\tID:SAMPLE14\tSM:SAMPLE14' ref.fa SAMPLE14_1.fastq.gz SAMPLE14_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE14 && samtools index SAMPLE14.bam
bwa mem -R '@RG\tID:SAMPLE15\tSM:SAMPLE15' ref.fa SAMPLE15_1.fastq.gz SAMPLE15_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE15 && samtools index SAMPLE15.bam
bwa mem -R '@RG\tID:SAMPLE16\tSM:SAMPLE16' ref.fa SAMPLE16_1.fastq.gz SAMPLE16_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE16 && samtools index SAMPLE16.bam
bwa mem -R '@RG\tID:SAMPLE17\tSM:SAMPLE17' ref.fa SAMPLE17_1.fastq.gz SAMPLE17_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE17 && samtools index SAMPLE17.bam
bwa mem -R '@RG\tID:SAMPLE18\tSM:SAMPLE18' ref.fa SAMPLE18_1.fastq.gz SAMPLE18_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE18 && samtools index SAMPLE18.bam
bwa mem -R '@RG\tID:SAMPLE19\tSM:SAMPLE19' ref.fa SAMPLE19_1.fastq.gz SAMPLE19_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE19 && samtools index SAMPLE19.bam
bwa mem -R '@RG\tID:SAMPLE20\tSM:SAMPLE20' ref.fa SAMPLE20_1.fastq.gz SAMPLE20_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE20 && samtools index SAMPLE20.bam
bwa mem -R '@RG\tID:SAMPLE21\tSM:SAMPLE21' ref.fa SAMPLE21_1.fastq.gz SAMPLE21_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE21 && samtools index SAMPLE21.bam
bwa mem -R '@RG\tID:SAMPLE22\tSM:SAMPLE22' ref.fa SAMPLE22_1.fastq.gz SAMPLE22_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE22 && samtools index SAMPLE22.bam
bwa mem -R '@RG\tID:SAMPLE23\tSM:SAMPLE23' ref.fa SAMPLE23_1.fastq.gz SAMPLE23_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE23 && samtools index SAMPLE23.bam
bwa mem -R '@RG\tID:SAMPLE24\tSM:SAMPLE24' ref.fa SAMPLE24_1.fastq.gz SAMPLE24_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE24 && samtools index SAMPLE24.bam
bwa mem -R '@RG\tID:SAMPLE25\tSM:SAMPLE25' ref.fa SAMPLE25_1.fastq.gz SAMPLE25_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE25 && samtools index SAMPLE25.bam
bwa mem -R '@RG\tID:SAMPLE26\tSM:SAMPLE26' ref.fa SAMPLE26_1.fastq.gz SAMPLE26_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE26 && samtools index SAMPLE26.bam
bwa mem -R '@RG\tID:SAMPLE27\tSM:SAMPLE27' ref.fa SAMPLE27_1.fastq.gz SAMPLE27_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE27 && samtools index SAMPLE27.bam
bwa mem -R '@RG\tID:SAMPLE28\tSM:SAMPLE28' ref.fa SAMPLE28_1.fastq.gz SAMPLE28_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE28 && samtools index SAMPLE28.bam
bwa mem -R '@RG\tID:SAMPLE29\tSM:SAMPLE29' ref.fa SAMPLE29_1.fastq.gz SAMPLE29_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE29 && samtools index SAMPLE29.bam
bwa mem -R '@RG\tID:SAMPLE30\tSM:SAMPLE30' ref.fa SAMPLE30_1.fastq.gz SAMPLE30_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE30 && samtools index SAMPLE30.bam
bwa mem -R '@RG\tID:SAMPLE31\tSM:SAMPLE31' ref.fa SAMPLE31_1.fastq.gz SAMPLE31_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE31 && samtools index SAMPLE31.bam
bwa mem -R '@RG\tID:SAMPLE32\tSM:SAMPLE32' ref.fa SAMPLE32_1.fastq.gz SAMPLE32_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE32 && samtools index SAMPLE32.bam
bwa mem -R '@RG\tID:SAMPLE33\tSM:SAMPLE33' ref.fa SAMPLE33_1.fastq.gz SAMPLE33_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE33 && samtools index SAMPLE33.bam
bwa mem -R '@RG\tID:SAMPLE34\tSM:SAMPLE34' ref.fa SAMPLE34_1.fastq.gz SAMPLE34_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE34 && samtools index SAMPLE34.bam
bwa mem -R '@RG\tID:SAMPLE35\tSM:SAMPLE35' ref.fa SAMPLE35_1.fastq.gz SAMPLE35_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE35 && samtools index SAMPLE35.bam
bwa mem -R '@RG\tID:SAMPLE36\tSM:SAMPLE36' ref.fa SAMPLE36_1.fastq.gz SAMPLE36_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE36 && samtools index SAMPLE36.bam
bwa mem -R '@RG\tID:SAMPLE37\tSM:SAMPLE37' ref.fa SAMPLE37_1.fastq.gz SAMPLE37_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE37 && samtools index SAMPLE37.bam
bwa mem -R '@RG\tID:SAMPLE38\tSM:SAMPLE38' ref.fa SAMPLE38_1.fastq.gz SAMPLE38_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE38 && samtools index SAMPLE38.bam
bwa mem -R '@RG\tID:SAMPLE39\tSM:SAMPLE39' ref.fa SAMPLE39_1.fastq.gz SAMPLE39_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE39 && samtools index SAMPLE39.bam
bwa mem -R '@RG\tID:SAMPLE40\tSM:SAMPLE40' ref.fa SAMPLE40_1.fastq.gz SAMPLE40_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE40 && samtools index SAMPLE40.bam
bwa mem -R '@RG\tID:SAMPLE41\tSM:SAMPLE41' ref.fa SAMPLE41_1.fastq.gz SAMPLE41_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE41 && samtools index SAMPLE41.bam
bwa mem -R '@RG\tID:SAMPLE42\tSM:SAMPLE42' ref.fa SAMPLE42_1.fastq.gz SAMPLE42_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE42 && samtools index SAMPLE42.bam
bwa mem -R '@RG\tID:SAMPLE43\tSM:SAMPLE43' ref.fa SAMPLE43_1.fastq.gz SAMPLE43_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE43 && samtools index SAMPLE43.bam
bwa mem -R '@RG\tID:SAMPLE44\tSM:SAMPLE44' ref.fa SAMPLE44_1.fastq.gz SAMPLE44_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE44 && samtools index SAMPLE44.bam
bwa mem -R '@RG\tID:SAMPLE45\tSM:SAMPLE45' ref.fa SAMPLE45_1.fastq.gz SAMPLE45_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE45 && samtools index SAMPLE45.bam
bwa mem -R '@RG\tID:SAMPLE46\tSM:SAMPLE46' ref.fa SAMPLE46_1.fastq.gz SAMPLE46_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE46 && samtools index SAMPLE46.bam
bwa mem -R '@RG\tID:SAMPLE47\tSM:SAMPLE47' ref.fa SAMPLE47_1.fastq.gz SAMPLE47_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE47 && samtools index SAMPLE47.bam
bwa mem -R '@RG\tID:SAMPLE48\tSM:SAMPLE48' ref.fa SAMPLE48_1.fastq.gz SAMPLE48_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE48 && samtools index SAMPLE48.bam
bwa mem -R '@RG\tID:SAMPLE49\tSM:SAMPLE49' ref.fa SAMPLE49_1.fastq.gz SAMPLE49_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE49 && samtools index SAMPLE49.bam
bwa mem -R '@RG\tID:SAMPLE50\tSM:SAMPLE50' ref.fa SAMPLE50_1.fastq.gz SAMPLE50_2.fastq.gz | samtools view -Sbu - | samtools sort - SAMPLE50 && samtools index SAMPLE50.bam
samtools mpileup -uSD -f ref.fa SAMPLE1.bam SAMPLE2.bam SAMPLE3.bam SAMPLE4.bam SAMPLE5.bam SAMPLE6.bam SAMPLE7.bam SAMPLE8.bam SAMPLE9.bam SAMPLE10.bam SAMPLE11.bam SAMPLE12.bam SAMPLE13.bam SAMPLE14.bam SAMPLE15.bam SAMPLE16.bam SAMPLE17.bam SAMPLE18.bam SAMPLE19.bam SAMPLE20.bam SAMPLE21.bam SAMPLE22.bam SAMPLE23.bam SAMPLE24.bam SAMPLE25.bam SAMPLE26.bam SAMPLE27.bam SAMPLE28.bam SAMPLE29.bam SAMPLE30.bam SAMPLE31.bam SAMPLE32.bam SAMPLE33.bam SAMPLE34.bam SAMPLE35.bam SAMPLE36.bam SAMPLE37.bam SAMPLE38.bam SAMPLE39.bam SAMPLE40.bam SAMPLE41.bam SAMPLE42.bam SAMPLE43.bam SAMPLE44.bam SAMPLE45.bam SAMPLE46.bam SAMPLE47.bam SAMPLE48.bam SAMPLE49.bam SAMPLE50.bam | bcftools view -vcg - > result.vcf

using make with option -j will parallelize things. See also : https://github.com/lindenb/ngsxml

ADD COMMENT
3
Entering edit mode
9.6 years ago

Depending on your sample naming scheme, perhaps use a for loop with seq:

#!/bin/bash

for idx in `seq 1 1000`
do
    SAMPLE_ID="sample_${idx}"
    bwa mem -R "@RG\tID:#${SAMPLE_ID}\tPL:ILLUMINA\tLB:lib1" $REF ${SAMPLE_ID}_1.fastq ${SAMPLE_ID}_2.fastq > ${SAMPLE_ID}.sam
done

If it isn't clear what this does, you might first run this to see how it works:

#!/bin/bash

for idx in `seq 1 1000`
do
    SAMPLE_ID="sample_${idx}"
    echo ${SAMPLE_ID}
done

If you are submitting jobs to a cluster (say, with qsub) then you just qsub within the for loop:

#!/bin/bash

for idx in `seq 1 1000`
do
    SAMPLE_ID="sample_${idx}"
    qsub /* ...options... */ bwa mem -R "@RG\tID:#${SAMPLE_ID}\tPL:ILLUMINA\tLB:lib1" $REF ${SAMPLE_ID}_1.fastq ${SAMPLE_ID}_2.fastq > ${SAMPLE_ID}.sam
done

To be a good neighbor to your fellow cluster users, you will want to debug and understand how this script will run, before you submit 1000 jobs to your cluster.

ADD COMMENT
0
Entering edit mode

Thanks for your help. I understand how to do this in a loop, but our cluster does not allow us to submit commands through qsub. Instead we may submit shell scripts. I'm trying to generate these files in a quick way. Is there a way to make the loop do this?

ADD REPLY
0
Entering edit mode

You might look into generating a template script file, and then using sed to replace a placeholder keyword in the template with your sample ID value.

ADD REPLY
4
Entering edit mode

Thank you, this is exactly what I ended up doing! For anyone interested, here is what I did:

samplenames=`cat samplenames_unique.txt`

for i in $samplenames
do
    sed s/XXXX/"$i"/g template_file.sh > $i.sh
done

"XXXX" was the placeholder text for the sample name in the template file I created.

ADD REPLY

Login before adding your answer.

Traffic: 6862 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6