Forum:why GATK makes things more complicated ?
3
5
Entering edit mode
3.6 years ago
Learner ▴ 250

I have been quite busy for sometimes now using this package. No offend but I found it bloody useless at least from being straightforward if one wants to use it.

1- how one can ask for a help ? ( just google how to ask a question, you go to a page where there is nothing allowing you to ask any question, of course I registered a week ago). there are many many issues, one says use queue one says don't use it.

one says align , the other says don't need and I read few papers in high end journals where all used different platforms !!!

2- all their ridiculous post are old , useless and not clear

3- version 3 is different from version 4, yet both you must have java 1.8 (don't use any higher) if you use bwa or whatever, you cannot find a way to use GATK for alignment , I have been killing myself to get it to match with no success !!! if you know how, show me please

I even tried to use an older version but I cannot find a way to get it to work. Is there someone who could direct me to a good source for it? or show me how to report my errors on the GATK web?

I am even trying to use cloud now, if anyone knows how to do that I would appreciate any guide

Ok, it seems like I need to take some serious actions :-) I summaries steps I could find, now if you could please help me find each steps command so that we all go home in peace :-)

Step 1 ----> Get your data forward and revers

Step 2 ----> Map with BWA mem (dont do it with other BWA)

Step 3 ----> Sam to Bam (you can use whater you like I use BWA)

Step 4 ----> Mark Duplicate reads (I use PICARD)

Step 5 ----> Uses samtools flagstat command to print descriptive information for a BAM dataset you generated in step 3

Step 6 ----> samtools mpileup multi-way pileup of variants

Step 7 ----> VarScan for variant detection

Step 8 ----> Annotate (You can use so many algithms, which one is the best? God KNOWS, maybe VCFannotateGenotypes I dont know)

Step 9 ----> You can filter your VCF data in a variety of attributes (is it necessary? God knows again)

Step 10 ---> ANNOVAR Annotate VCF

Step 11 ---> Go get a damn beer because you went through a lot

Thanks

genomics Forum • 1.8k views
4
Entering edit mode

Why is it complicated indeed?

It is a good question one that cuts to the very core of how science is practiced today. The short answers, in my opinion, are that there is no incentive and reward to make code simple, in addition a large number of people that use GATK directly complete with the organization that makes GATK. Since they already know how it works all the effort going into making it "simple" would be spent on the competition, a disincentive if there ever was one.

Think about the absurdity how the supposedly sophisticated GATK code will not run if you don't have a .fai file and a .dict file (each of which would take seconds to generate and the extra time for generating those is a rounding error to how GATK operates) but nah, they rather produce an insane error message (20 seconds into the run) sending you to an outdated link that does not actually explain anything. They already know everyone has this problem, but instead of fixing it, you got to hit the manual, you got to hit the books to find out how to create those file.

0
Entering edit mode

@Istvan Albert I appreciate your time , thanks. I personally think these tools came to make life easier not more complicated, what is the point of spending hours and hours and still thinking that you are doing something wrong :-) ? I just cannot believe the amount of NIH grant wasted on such stuff and people still need to crack a mystery :-( no cool at all ! I wish NIH could give me some :-D

0
Entering edit mode

Not sure how much NIH money went toward GATK, definitely >\$1M http://grantome.com/grant/NIH/U01-HG006569-01 but maybe not for the initial development

0
Entering edit mode

I just today ran GATK, which explicitly told me that a .fai file was necessary (and a couple of minutes later reminded me that a .dict file is necessary). It would be useful if it could create those by itself, indeed.

3
Entering edit mode
3.6 years ago

how one can ask for a help ?

all their ridiculous post are old , useless and not clear 3- version 3 is different from version 4,

agree. but they keep improving the way gatk works. Old version were not able to run with new technologies like spark.

yet both you must have java 1.8 (don't use any higher)

because oracle has deprecated/removed many classes since java 1.8

I keep using java 3.8. It works fine for my needs.

0
Entering edit mode

@Pierre Lindenbaum thanks for your reply. I liked it , I have gone to that link 100 times, there is no tap nothing that allow me to write a request, comment or whatever , in the bottom of their page, I asked for confirmations at least 100 times, I don't get any email. I registered with 2 email addresses personal and work, nothing happened. Do you have any workflow how to do the analysis? I did bwa then convert to bam and index , then tried to align with GATK did not work, I did it with Picard convert it to bam , ran GATK, did not work. I did it with 3.8 GATK and 3.7 and 4 , no way , none worked for me.

1
Entering edit mode

. Do you have any workflow how to do the analysis?

https://github.com/gatk-workflows/

then tried to align with GATK did not work

https://meta.stackexchange.com/questions/147616

0
Entering edit mode

@Pierre Lindenbaum I showed the error here. C: how to perform the RealignerTargetCreator when there is not this algorithm anymo now I try to understand how they run their wld scripts , unfortunately not a good workflow whatsoever , no example no description , look at their code !!!! so messy

0
Entering edit mode

The GATK documentation is confusing and needs to improve in its clarity - I agree with you. I ceased using GATK a few years ago and now use BCFtools mpileup / call for calling heterozygous/homozygous variants, pindel for calling indels, and then, e.g., SomaticSniper for calling somatic variants.

3
Entering edit mode
3.6 years ago

I do not agree that it's too complicated. It's a huge ecosystem, so obviously they cannot fit everything on one manual page. They also need to support multiple versions of the tools. You cannot remove everything from GATK3.8 as soon as GATK4 is available. People rely on older versions or older settings. Technically indel realignment is deprecated, but as far as I know they still keep it up because people are used to it. Thousands of genomes and tens of thousands of exomes have been analyzed using these tools, surely it can't be that bad? They give training all over the world, there are multiple tutorials on youtube, they have their own support forum on which they help users. For free right.

Coincidentally I installed GATK today again on two servers. Downloaded the zip file, activated the conda environment (installing all dependencies) and ready to role. Searched using google which tool I needed to convert gvcf to vcf (found it on their forum together with the command line usage). Executed the wrapper tool, fucked up some command line options because I wasn't paying attention and got a helpful error message telling me what was wrong.

Your starting point should be their explanation on best practices. On the left of the page you'll find links to specific workflows, e.g. this one for germline SNP calling, telling you exactly what you need to do in which order. Note that there is also a snakemake best practices workflow for GATK variant calling.

2
Entering edit mode
3.6 years ago

1- how one can ask for a help ?

You can just ask here, like you are doing now ;)

2- all their ridiculous post are old , useless and not clear

3- version 3 is different from version 4,

Each program you use have differences between the versions. So whenever you find a tutorial/code example the first thing you have to do, is to check if the command and the used arguments are valid for the version you are using. All/most programs provide a help page by typing <programname> --help or just <programname>without any parameter. Check this first!

if you use bwa or whatever, you cannot find a way to use GATK for alignment

I guess you are talking about the IndelRealignment like in your other post? This step was in the old best practice guide for gatk and was only recommended using the UnifiedGenotyper. This variant caller is no longer available for the favor of HaplotypeCaller. The later one, is doing a denovo assembly around an indel. This is way gatk doesn't include the Realigner anymore. HaplotypeCaller is also available in gatk3. If you have to use gatk I strongly recommend using this variant caller.

The gatk best practice is confusing for users new to this field, because they involved so many steps. Several steps are only necessary in individual, special cases. So here is my recommendation for a minimal pipeline to start your analysis. Once you are through read about the other steps mention in the best practice guide to decide if these steps are necessary in your case:

1. map and align raw reads to reference genome
2. sort and convert alignment to bam
3. do a variant calling
4. try to filter false positiv calls

That's all for the beginning. All other things one can do to improve your results are highly depended from the source of your DNA, the library preparation, sequencing platform and goal of your analysis.

fin swimmer

0
Entering edit mode

@finswimmer I liked your answer already but look at my own steps , way much better :-) now I need to put together stuff so that people will understand each step , if you could help , I appreciate it highly