Question: Allele frequency calculation
0
gravatar for nkausthu
3.6 years ago by
nkausthu30
nkausthu30 wrote:

I have ~200 exomes which includes related and unrelated individuals. I have done the joint genotype calling and calculated the allele frequency using VCFtools. But as its a mixed population what is the ideal way to calculate allele frequency?. I would like to make an in-house variant database from our available exomes and the corresponding allele frequencies will be used to filter the variants. It would be helpful if you can give some further information about the methods to adjust relatedness. Thank you ..

ADD COMMENTlink modified 3.5 years ago • written 3.6 years ago by nkausthu30

You could use GATK SelectVariants to subset the VCF file accordingly and then calculate the AF.

ADD REPLYlink written 3.6 years ago by Dave Tang190

Actually I would like to know if I include all these related individuals along with unrelated individuals for calculating the allele frequency will it be biased? Or is there any statistical way to avoid this bias?

ADD REPLYlink written 3.6 years ago by nkausthu30

I would expect that AFs will be more similar amongst related individuals. Depending on what you want to do, there are methods for adjusting for relatedness.

ADD REPLYlink written 3.6 years ago by Dave Tang190

I would like to make an in-house variant database from our available exomes and the corresponding allele frequencies will be used to filter the variants. It would be helpful if you can give some further information about the methods to adjust relatedness. Thank you ..

ADD REPLYlink written 3.6 years ago by nkausthu30
1

Please add this information to your initial post and try to be as informative as possible when asking questions. Those details are very important.

For filtering you don't want to inflate the allele frequencies because of related individuals. I think the only correct way of creating such a database would be to count a variant shared by e.g. three sibs as just once. You should count in how many families variants are observed, because those observations are not independent. An easier way (but you will lose information) would be to not include related individuals (essentially just chose one individual per family, randomly).

ADD REPLYlink written 3.6 years ago by WouterDeCoster44k

You are absolutely right!! Removing redundant variants from related individual is something I though about but again the problem is which zygosity I should keep. eg : same variant in het/het/hom in three related individuals and which variant I will keep and which will I remove? . As you already told if I consider one individual form each family then I will loose so many variants. So I am bit confused ...

ADD REPLYlink written 3.6 years ago by nkausthu30

Don't count a variant twice if the two observations are from the same family, it still counts as one.

ADD REPLYlink written 3.6 years ago by WouterDeCoster44k

just consider the following 3 scenarios

  1. 3 related individuals - het/het/het - this will be taken as 1 allele count
  2. 3 related individuals - hom/hom/hom - this will be taken as 2 allele counts
  3. 3 related individuals - het/hom/het - what will be the allele count in this scenario?
ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by nkausthu30

I would simplify scenario 3 to 2 allele counts. But it's imperfect.

ADD REPLYlink written 3.6 years ago by WouterDeCoster44k
1

You can adjust population allele frequency for relatedness using ideas described above. From my experience and 1000 genome project, some people incorrectly report their relatedness and ethnic group. Because of this I strongly recommend you to test for relatedness based on vcf files you have. You can use KING http://people.virginia.edu/~wc9c/KING/manual.html for this. You can use apriori probabilities given the relatedness and correct for it to count each allele frequncy approximately once.

Another thing to consider is storing the number of alternative homozygotes and heterozygotes you saw with no pathology. The reason is for inheritance model and penetrance testing.

ADD REPLYlink written 3.6 years ago by Petr Ponomarenko2.6k

Your post does not explain who these people are or what you are trying to accomplish. Or even what kind of data you have. Please clarify it, in great detail.

ADD REPLYlink written 3.6 years ago by Brian Bushnell17k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1094 users visited in the last hour