Using Bash For Gatk Pipeline
2
1
Entering edit mode
8.3 years ago
Paul ★ 1.4k

Hello my bio-friends,

can anybody help me please to fix my input arguments in BASH for gatk. I am trying to write script in bash for automating processing GATK steps (local realignment around indels, BQSR and call raw variants) for all my *bam samples in current directory.

I have problem in step two - Realigning bam file - there are two input variables - for each sample is *table.list and *_raw.bam file.

When I use code like this:

echo 'Realigning step starting'

for j 
in *list *bam    
do java -Xmx32g -jar $gatk -T IndelRealigner -I $j -R $reference -targetIntervals $j -o ${i%.list}.realignedBam.bam
done;

echo 'Realigned step is done!!!'

I have an error log: ##### ERROR MESSAGE: Couldn't read file in.bam because The interval file in.bam does not have one of the supported extensions (.bed, .list, .picard, .interval_list, or .intervals). Please rename your file with the appropriate extension.

I understand, that in one for cycle i have just one argument j and two in files. Is there any idea how to load two variables (*list , *bam) in one for cycle for each sample?

I hope my question is clear...

Thank you for any ideas and help.

Petr.

gatk bash script • 4.9k views
ADD COMMENT
2
Entering edit mode
8.3 years ago
Sudeep ★ 1.7k

Let's say that you have followed the same sort of naming convention for your *list and *bam file, say something like experiment1.bam and its correspoding experiment1.list, then what you can do is create another variable for your "list file" on the fly in for loop

for i in *.bam;
 do y=$(echo $i|sed 's/bam/list/g');
 java -Xmx32g -jar $gatk -T IndelRealigner -I $i -R $reference -targetIntervals $y -o ${i%.list}.realignedBam.bam; 
done;

Here it is basically assumed that you have a file called experiment1.bam and its list file experiment1.list in your "current working directory". In the for loop iteration, the variable i already holds the string experiment1.bam and in the second line we assign it to another variable y and at the same time replace bam in i with list. So if everything is correct, then i should hold path to experiment1.bam and y should hold path to experiment1.list which is then passed as arguments to gatk

ADD COMMENT
0
Entering edit mode

wau - it is possible please to explain what is happen here : y=$(echo $i|sed 's/bam/list/g'); ?? thank you!

ADD REPLY
1
Entering edit mode

It finds the string "bam" and replaces it with "list" in the $i variable. The alternative approach would be to just strip the extension off:

for i in *.bam
do
    base_name=${i%.bam}
    java -Xmx32g -jar $gatk -T IndelRealigner -I $i -R $reference -targetIntervals $base_name.list -o $base_name.realignedBam.bam; 
done;
ADD REPLY
0
Entering edit mode

I have edited my original post

ADD REPLY
0
Entering edit mode

Thank you guys!! Thank works perfectly fine for me!!! Just add semicolon between do list_name=${j%.bam}**;** java -Xmx32g -.......

ADD REPLY
0
Entering edit mode

hmm, do you really need g option in sed? It should work fine without it.

ADD REPLY
0
Entering edit mode

I know, it doesn't make much sense here, but got used to using g in almost all the cases,can't help it :)

ADD REPLY

Login before adding your answer.

Traffic: 2490 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6