I've been trying to solve this problem for a month now, so I thought it'd be time to ask for some help.
I've got a dataset that looks like this (anonymized with x):
ID ID-87xxxxx ID-88xxxxx ID-87xxxxx ID-96xxxxx IndividualA 2 1 2 0 IndividualB 1 1 1 1 IndividualC 0 2 2 0 IndividualD 0 0 0 1 IndividualE 1 1 1 2 IndividualF 1 1 2 1 IndividualG 2 0 1 0 IndividualH 1 1 0 1
The 0,1 and 2 depict zogysity. The columns represents a marker. For any marker an individual's genotype is codified as the count of the copies of the second allele, meaning:
0: homozygote for the first allele 1: heterozygote 2: homozygote for the second allele 5: Unknown
I have 55k+ SNPs, and several thousand individuals (with their own unique 14 character long code).
- What is the name of this type of data? (Is it allele count?)
- How do I convert this kind of data into something else? I am going to use NeEstimator, Structure and other software, and none of them accepts this format. It would be great to convert it to a data type I can use to further convert it to what I need (I know GENEPOP does this well)
- Is there any program that makes use of this format?
Thank you for reading, and for any help you may provide. I have tried looking for answers to these questions for a long time now.