Question: Using Bash For Gatk Pipeline
1
gravatar for Paul
3.2 years ago by
Paul490
European Union
Paul490 wrote:

Hello my bio-friends,

can anybody help me please to fix my input arguments in BASH for gatk. I am trying to write script in bash for automating processing GATK steps (local realignment around indels, BQSR and call raw variants) for all my *bam samples in current directory.

I have problem in step two - Realigning bam file - there are two input variables - for each sample is *table.list and *_raw.bam file.

When I use code like this:

echo 'Realigning step starting'

for j 
in *list *bam    
do java -Xmx32g -jar $gatk -T IndelRealigner -I $j -R $reference -targetIntervals $j -o ${i%.list}.realignedBam.bam
done;

echo 'Realigned step is done!!!'

I have an error log: ##### ERROR MESSAGE: Couldn't read file in.bam because The interval file in.bam does not have one of the supported extensions (.bed, .list, .picard, .interval_list, or .intervals). Please rename your file with the appropriate extension.

I understand, that in one for cycle i have just one argument j and two in files. Is there any idea how to load two variables (*list , *bam) in one for cycle for each sample?

I hope my question is clear...

Thank you for any ideas and help.

Petr.

gatk script bash • 2.2k views
ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by Paul490
2
gravatar for Sudeep
3.2 years ago by
Sudeep1.4k
.
Sudeep1.4k wrote:

Let's say that you have followed the same sort of naming convention for your *list and *bam file, say something like experiment1.bam and its correspoding experiment1.list, then what you can do is create another variable for your "list file" on the fly in for loop

for i in *.bam;
 do y=$(echo $i|sed 's/bam/list/g');
 java -Xmx32g -jar $gatk -T IndelRealigner -I $i -R $reference -targetIntervals $y -o ${i%.list}.realignedBam.bam; 
done;

Here it is basically assumed that you have a file called experiment1.bam and its list file experiment1.list in your "current working directory". In the for loop iteration, the variable i already holds the string experiment1.bam and in the second line we assign it to another variable y and at the same time replace bam in i with list. So if everything is correct, then i should hold path to experiment1.bam and y should hold path to experiment1.list which is then passed as arguments to gatk

ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by Sudeep1.4k

wau - it is possible please to explain what is happen here : y=$(echo $i|sed 's/bam/list/g'); ?? thank you!

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by Paul490
1

It finds the string "bam" and replaces it with "list" in the $i variable. The alternative approach would be to just strip the extension off:

for i in *.bam
do
    base_name=${i%.bam}
    java -Xmx32g -jar $gatk -T IndelRealigner -I $i -R $reference -targetIntervals $base_name.list -o $base_name.realignedBam.bam; 
done;
ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by Devon Ryan63k

I have edited my original post

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by Sudeep1.4k

Thank you guys!! Thank works perfectly fine for me!!! Just add semicolon between do list_name=${j%.bam}**;** java -Xmx32g -.......

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by Paul490

hmm, do you really need g option in sed? It should work fine without it.

ADD REPLYlink written 3.2 years ago by PoGibas4.4k

I know, it doesn't make much sense here, but got used to using g in almost all the cases,can't help it :)

ADD REPLYlink written 3.2 years ago by Sudeep1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1309 users visited in the last hour