allele frequencies, genotype
1
0
Entering edit mode
7.4 years ago
prostoesh ▴ 20

Can anyone pls explain to me what exactly the genotype information is? I need to compare allele frequencies (or smthing like that about genotype) from 2 simulation scenarios which are written in a very specific format generated by mcms tool.

Here is an example of a file:

ms 5 3 -t 5 -N 5000000 -I 3 5 0 0 -es 1.5 1 0.5 -es 7.5 1 0.5  [3.2rc Build:162]
0xebfc42a7c6a366af

//
segsites: 7
positions: 0.04375 0.04618 0.28698 0.68964 0.75697 0.77997 0.87296
0000100
0000100
1111011
0000100
0000100

//
segsites: 25
positions: 0.03118 0.08248 0.13261 0.24075 0.34263 0.40965 0.44703 0.49119 0.55406 0.55828 0.55906 0.58059 0.58536 0.63079 0.63707 0.65367 0.67934 0.72050 0.75345 0.78975 0.81020 0.83922 0.88021 0.94746 0.95751
0000100100010001001111001
0000000100111000001110001
0110000100010100001110001
0000000100111000001110001
1001011011000010110000110

//
segsites: 6
positions: 0.07350 0.15611 0.37691 0.80368 0.98965 0.99393
100000
001011
001111
001011
010000
genome SNP sequence • 1.5k views
ADD COMMENT
1
Entering edit mode
7.4 years ago
Vitis ★ 2.5k

Looks like the first line tells the number of segregation sites, which are DNA sites in the genome showing a variant (SNP/indel) in the simulated population. The second line looks like allele frequencies at each site, and the remaining lines are genotypes (0 as A and 1 as a, or some other coding scheme). I wonder why the allele frequencies are not calculated directly from the genotypes, it must have been corrected from some models in your simulation. I strongly suggest you to read the mcms manual and paper before looking at the results.

ADD COMMENT
0
Entering edit mode

Thank you for your reply! I really appreciate the help, because i've been trying to understand the msms programm for some time now.

According to manual, the second line isn't exactly allele frequencies - it is the positions of segregation sites, distributed on the (0,1) interval, respectively to the segsites on an actual simulated allele.

Though, can you help me a little more, and explain how do i get allele frequency from a genotype (which is actully a haplotype) like this? The manual says

"the haplotypes of each of the sampled chromosomes are given, each as a string of zeros and ones. The ancestral state is coded with a zero, and the mutant, or derived state, indicated with a one"

Whats the procedure to get allele frequency? i'm quety confused

ADD REPLY
1
Entering edit mode

Allele frequency is calculated based on genotypes in a column (one column is a segregation site). For example, for the first site (out of 7 sites) in the first simulation, allele frequency of genotype 1 will be 20% (1 out of 5) and frequency of genotype 0 will be 80% (4 out of 5). For haplotype frequency, it's a bit more complicated, you'll have to look at all sites. Again, using the first simulation as an example, there are two different unique haplotypes in this data set: 0000100, 1111011. Their haplotype frequency is 80% (4 out of 5) and 20% (1 out of 5), respectively.

ADD REPLY
0
Entering edit mode

Thanks a lot! that really helped!

ADD REPLY

Login before adding your answer.

Traffic: 2283 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6