Difference Between GATK, Freebayes and SAMtools mpileup
2
1
Entering edit mode
8.7 years ago

Hi,

I'm new to the NGS data analysis. i am studying from the very basics of NGS how the data are analysed.

For aligning the Raw reads there many tools each one have different algorithm for example BWA works on BWT Novoalign works on Needleman wunch Mosaik works on Smith- Water man

what about the variant caller Most of the people are using GATK, SAMtools mpileup, and freebayes for the variant calling i have gone through their site but i cant find what algorithm are the using for detecting the variant where in freebayes they told that they have been using the bayesian. i want to know whether all the variant caller works on the bayesian algoritm or they are having different way of finding the variant using different algorithm

NGS Variant Caller • 8.1k views
ADD COMMENT
0
Entering edit mode

Most of the code for GATK is available on github.

ADD REPLY
0
Entering edit mode

Sorry but how is this answer relevant to the question? I'm moving it to a comment.

ADD REPLY
3
Entering edit mode
8.7 years ago

I found these are the major difference between GATK and Samtools

  1. Preprocess of alignment" GATK drops reads with low mapping quality , but samtools uses all reads by default.
  2. SNP genotype likelihood model: GATK assumes sequencing errors are independent while samtools believes the second error comes at a higher chance.Indel genotype likelihood model. This is one of the major differences between samtools and GATK. Samtools model is derived from BAQ. GATK's model was derived from Dindel's model.
  3. Samtools uses hand-tuned filters, while GATK learns filters from data. Of course GATK's approach is more convenient and powerful at least for human variant calling where you have enough data to train the model and you do not need non-polymorphic sites
  4. Other GATK specific features: haplotype caller, indel realignment among many others.
  5. Other samtools specific features: genotype-free analysis, physical phasing and so on.
ADD COMMENT
1
Entering edit mode

Nice summary! I forgot to point you to this thread Difference Between Samtools And Gatk Algorithms, sorry :) Regarding INDELs, please keep in mind, that GATK can be easily fooled by lack of coverage and short deletion! This could be clearly seen, if you are calling variants in exon regions of close range.

ADD REPLY
0
Entering edit mode

Nice find. I remember reading it long ago , but I'd forgotten it existed:)

ADD REPLY
0
Entering edit mode

Good job - this is very informative. Thank you!

ADD REPLY
2
Entering edit mode
8.7 years ago
H.Hasani ▴ 990

Hi,

as I'm using GATK and Samtools, I can recommend the following papers. However, I've never used freebayes:

Samtools: "A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data", PMID: 21903627

GATK: "The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.", PMID: 20644199

Hth

ADD COMMENT

Login before adding your answer.

Traffic: 1726 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6