I have a matrix in which the rows are isolates and columns are nucleotides at select sites where homozygous variation has been detected. Is there a way to do an Fst test? I can export this matrix into R. I never done an Fst before.
Update: My data consists in 6 isolates, and for every isolate, I have a vcf file, indicating variants regarding a the genome reference. So it looks something like this:
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT whatever Sample1 8139885 . A G 591.03 . AB=0.342857;ABP=18.0245;AC=1;AF=0.25;AN=4;AO=24;CIGAR=1X;DP=70;DPB=3323;DPRA=0;EPP=46.8017;EPPR=94.401;GTI=0;HWE=-0;LEN=1;MEANALT=1;MQM=255;MQMR=255;NS=1;NUMALT=1;ODDS=3.62626;PAIRED=1;PAIREDR=1;PAO=6.95324e-310;PQA=0;PQR=0;PRO=6.95324e-310;QA=920;QR=1770;RO=46;RPP=46.8017;RPPR=94.401;RUN=1;SAP=55.1256;SRP=102.898;TYPE=snp;XAI=0.00803798;XAM=0.0305247;XAS=0.0224867;XRI=0.00860706;XRM=0.0107998;XRS=0.00219274;technology.illumina=1;BVAR GT:DP:RO:QR:AO:QA 0/0/0/1:70:46:1770:24:920
This corresponds to one position where a variant has been found. The 6 files have a list of variants present in them, compared to the reference genome. As you can see, it tells me that A and G at that location are present in about a 2 to 1 ratio, since there are 46 observations for A and 24 for G, and the algorithm approximates the Frequency of the alternate allele G to be 0.25.
That being said, this is an observation for the entire population being sequenced by NGS. If the organism is tetraploid, my conclusion is that all individuals have the G allele is one out of 4 chromatids, and A in 3 out of 4 chromatids. There is not much more I can say here, is there? I do not know how many are heterozygous A/G or homozygous A or homozygous G and so on. I just know the frequency of allele A and frequency of allele G.