Casava 1.8 Vs Gatk
3
7
Entering edit mode
10.9 years ago

The illumina sequenchers came with its own analysis pipeline programs, using ELAND2 for alignment and CASAVA for SNP calling.

On the other hand seems that GATK is becoming one of the standard tools for SNP calling where recalibration and local realignment make a difference in the quality of the SNP calls.

Questions: A) The quality of the alignment is important for the SNP calling, are there big differences between bwa and ELAND2 that would affect the SNP calling results?

B) Is CASAVA_1.8 comparable in SNP detection quality to GATK?

C) Overall question: Is there any big advantage to use the illumina pipeline (CASAVA, etc) or if you are get used to use bwa for alignments and GATK for SNP calling would be better to keep on this track?.

next-gen sequencing illumina gatk bwa • 9.3k views
ADD COMMENT
10
Entering edit mode
10.8 years ago
Allpowerde ★ 1.3k

I agree that the subsequent analysis is more important for the results than the initial variant calling, however it is always good to actually do the comparison, especially since setting up the GATK-based pipeline is considerably more time consuming than just using CASAVA and as such needs to be justified.

So have a look at this document:

Bauer, Denis. Variant calling comparison CASAVA1.8 and GATK. Available from Nature Precedings http://dx.doi.org/10.1038/npre.2011.6107.1 (2011)

Abstract: This work aims at addressing the question of whether the new CASAVA1.8, which boasts improvements such as local realignments of reads, is at par with the well accepted pipeline of BWA mapping, duplicate removal, local realignment, re-calibration and variant calling using GATK. We therefore compare the two methods on chromosome 21 of a Yoruba trio and compare the results to the genotype identified by the 1000 genomes project.

We find that the mapping performance is the same for CASAVA1.8 and the academic pipeline, resulting in a mean coverage of about 22. CASAVA1.8 and GATK both call about 70.000 SNPs per individual of which 80% overlap between CASAVA1.8, GATK and the 1000 genomes project. This stands in contrast to the indel calling performance where CASAVA1.8 calls about 12,000 indels while GATK calls 16,000. Furthermore, CASAVA1.8 has a higher Mendelian error rate and frequently more than one alternative allele per locus indicating a non-optimal alignment.

We conclude that CASAVA1.8 has come a long way and can be considered a mature SNP calling approach. However, CASAVA1.8 does not deliver the same quality in the indel calling set compared to the newly incorporated Dindel-algorithm of GATK. It hence remains the best practice to use CASAVA1.8 for producing fastq files and switch at this stage to the academic tools for mapping, alignment improvement and variant calling.

ADD COMMENT
0
Entering edit mode

As you said, setting up the GATK-based pipeline is considerably more time consuming than just using CASAVA and also casava comes integrated in the illumina cluster so probably for HiScan and MySeq using the new casava 'will do the job' properly. Thanks. My doubt was not if bwa-GATK was better but if casava should be avoided, but seems that it is not the case for 1.8.

ADD REPLY
0
Entering edit mode

Yes, CASAVA1.8 has come a long way, the only thing that might make a difference is that GATK can use multiple samples to generate the prior for calling a SNP, whereas CASAVA is processing samples independently.

ADD REPLY
8
Entering edit mode
10.9 years ago

I found that some people (and this may or may not apply to you) tend to worry too much about picking one particular tool vs the other and spend too little time understanding the particularities and details of the tool that they picked.

Each of these tools works well - what you need is to focus on is understanding each step of the analysis, verify each intermediate result, tweak the parameters, re-evaluate etc. This will have a far greater impact on the quality of the result than picking tool A vs tool B.

Some tools have better documentation, others may be easier for you to understand, or may have different run-time or performance characteristics. The best tool is the one you are most comfortable with.

ADD COMMENT
1
Entering edit mode

+1 that is totally true, and this is way I would like to hear from people doing extensive work with CASAVA. Which are the limitations and strengths that they have found?. I only know labs using samtools, pikard, GATK pipelines so I would like to hear what CASAVA users think.

ADD REPLY
0
Entering edit mode

So true... +1 for taking away my worries on what tool to use!

ADD REPLY
3
Entering edit mode
10.8 years ago
Ngsfan ▴ 30

Regarding that article, it may not yet be comparable:

Additionally, he pointed out that the SNP call sets produced in the analysis are rather "raw" and weren't subjected to filtering, which would usually be done based on many covariates of error.

Similarly, the indel calls seems to have had some filtering but not enough, Banks said.

"The way it works in our pipeline is that the raw calls coming off the Unified Genotyper – GATK's SNP and indel caller — are meaningless," he explained. "We care about post-filtered calls ... the call set [Bauer] produced is pretty raw and isn't one we would have produced in our pipeline, for example."

http://www.genomeweb.com/informatics/comparison-broads-gatk-shows-illuminas-casava-18-good-snps-short-indels?page=2

ADD COMMENT

Login before adding your answer.

Traffic: 1083 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6