Using Bash For Gatk Pipeline
8.3 years ago
Paul ★ 1.4k

Hello my bio-friends,

can anybody help me please to fix my input arguments in BASH for gatk. I am trying to write script in bash for automating processing GATK steps (local realignment around indels, BQSR and call raw variants) for all my *bam samples in current directory.

I have problem in step two - Realigning bam file - there are two input variables - for each sample is *table.list and *_raw.bam file.

When I use code like this:

echo 'Realigning step starting'

for j
in *list *bam
do java -Xmx32g -jar gatk -T IndelRealigner -Ij -R $reference -targetIntervals$j -o {i%.list}.realignedBam.bam done; echo 'Realigned step is done!!!'  I have an error log: ##### ERROR MESSAGE: Couldn't read file in.bam because The interval file in.bam does not have one of the supported extensions (.bed, .list, .picard, .interval_list, or .intervals). Please rename your file with the appropriate extension. I understand, that in one for cycle i have just one argument j and two in files. Is there any idea how to load two variables (*list , *bam) in one for cycle for each sample? I hope my question is clear... Thank you for any ideas and help. Petr. gatk bash script • 4.9k views ADD COMMENT 2 Entering edit mode 8.3 years ago Sudeep ★ 1.7k Let's say that you have followed the same sort of naming convention for your *list and *bam file, say something like experiment1.bam and its correspoding experiment1.list, then what you can do is create another variable for your "list file" on the fly in for loop for i in *.bam; do y=(echo $i|sed 's/bam/list/g'); java -Xmx32g -jar$gatk -T IndelRealigner -I $i -R$reference -targetIntervals $y -o${i%.list}.realignedBam.bam;
done;


Here it is basically assumed that you have a file called experiment1.bam and its list file experiment1.list in your "current working directory". In the for loop iteration, the variable i already holds the string experiment1.bam and in the second line we assign it to another variable y and at the same time replace bam in i with list. So if everything is correct, then i should hold path to experiment1.bam and y should hold path to experiment1.list which is then passed as arguments to gatk

0
wau - it is possible please to explain what is happen here : y=$(echo$i|sed 's/bam/list/g'); ?? thank you!

1
It finds the string "bam" and replaces it with "list" in the $i variable. The alternative approach would be to just strip the extension off: for i in *.bam do base_name=${i%.bam}
java -Xmx32g -jar gatk -T IndelRealigner -Ii -R $reference -targetIntervals$base_name.list -o base_name.realignedBam.bam; done;  ADD REPLY 0 Entering edit mode I have edited my original post ADD REPLY 0 Entering edit mode Thank you guys!! Thank works perfectly fine for me!!! Just add semicolon between do list_name={j%.bam}**;** java -Xmx32g -.......

0
hmm, do you really need g option in sed? It should work fine without it.

0
I know, it doesn't make much sense here, but got used to using g in almost all the cases,can't help it :)