Question: How Is Heterozygosity Determined In A Consensus Fasta?
0
gravatar for Justin
5.4 years ago by
Justin440
United States
Justin440 wrote:

I have a consensus fasta file generated by samtools/bcftools/vcfutils as discussed in a previous post here: How to generate a consensus fasta sequence from SAM tools pileup?

In some sites, I see e.g. R, which means that site was heterozygous A/G.

How does it determine it's heterozygous? E.g. if you see 50 A's and 39 G's, how do you know it's a het? By using a minor allele % threshold? Or using a probability model? If it uses probability, are there any references out there that describe the math?

consensus samtools • 1.8k views
ADD COMMENTlink modified 5.4 years ago by Sean Davis25k • written 5.4 years ago by Justin440
1
gravatar for Sean Davis
5.4 years ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:

See this link:

http://sourceforge.net/apps/mediawiki/samtools/index.php?title=SAM_FAQ#How_are_SNPs_and_indels_called_and_filtered_by_SAMtools.3F

In short, the MAQ paper is where the method is described:

http://www.ncbi.nlm.nih.gov/pubmed/18714091

ADD COMMENTlink written 5.4 years ago by Sean Davis25k

Thanks!

For those interested in the math, see the heading "Consensus genotype calling" in the "Methods" section of the paper.

Basically, it's a Bayesian probability model where you want P(genotype g | observed data D), which you can get from Bayes' theorem if you have P(D | g), which the paper explains how to get. Then you estimate g as g* = argmax[g] P(g | D)

ADD REPLYlink written 5.4 years ago by Justin440
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 786 users visited in the last hour