What does it mean 'SNP calling', exactly? Can anyone explain it to me, or at least point me to a good explanation on Internet?
I can't find a good definition in the web.
What does it mean 'SNP calling', exactly? Can anyone explain it to me, or at least point me to a good explanation on Internet?
I can't find a good definition in the web.
SNP calling is a bit of a misnomer, as it implies finding "SNPs" in NGS data. Without information about population frequency or function, it is premature to call a single nucleotide change a "polymorphism". With that caveat in mind, "SNP calling" in the context of NGS data analysis might be defined as the process of finding bases in the NGS data that differ from the reference genome, typically including an associated confidence score or statistical evidence metric. Since NGS data all have finite errors, this process requires that a given reference base be read by the NGS technology multiple times. The details of this analysis vary somewhat by application, but an early and still applicable description can be found in the paper by Heng Li describing the MAQ alignment and variant calling algorithm:
http://www.ncbi.nlm.nih.gov/pubmed/18714091
Sean
How much information do you need? It's the identification of single nucleotide polymorphisms that are due to genuine sequence level variation rather than errors produced by the underlying sequencing technology. We used to spend time ogling sequence traces and alignments to do this with capillary style sequence data.
NGS data tends to defer this to a pipeline such as MAQ which aligns read data to a reference sequence and will use quality score information from the reads to make a decision whether a difference seen is a SNP or not. The algorithms used for this differ widely in their results.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
thank you very much for the answer.