Question: Understanding AWClust's SNP file data format and converting a PED file to it
gravatar for aritra90
5.3 years ago by
United States
aritra9060 wrote:


It would be great if anybody here could help me understand AWClust's input data format. They have described it in the manual, but, I don't get it completely. 

This is the description: 

The first row in the SNP file is contains names or IDs of the individuals in the dataset separated by white space. Each subsequent row represents a single SNP and the different alleles each individual has for that SNP, also separated by white space. The SNP information is encoded as numeric values (i.e. 0, 1, or 2) to represent the number of variant SNP alleles in genotypes (i.e. 0 implies that there are no SNP variants in the genotype, 1 for heterozygotes and 2 for homozygotes for SNP variants), and -1 is used to represent missing values.

They also give a sample snippet of the data: 

1 0 1 0 1
1 1 2 0 2
-1 1 0 1 2
1 1 2 1 1


I get the encoding logic, but, each row representing a single SNP. Does that mean, I will get something like this for the example below.  Where if I have a  PED file like this : 

FAM001  1  0 0  1  2  A A  G G  A C 
FAM002  2  0 0  1  2  A A  A G  0 0 

AWClust's input file format would be:

FAM001 FAM002

2 2 (for AA) 

2 2 (for AA)

2 1 (for GG)

1 2  (for AG)

2 0 (for AC)

do we need another one for 00 as well ? 

I would really appreciate it if anybody can explain this to me andd direct me if there's a known tool to convert a PED file to this format. 




ped plink allele snp • 1.7k views
ADD COMMENTlink modified 5.3 years ago by Biostar ♦♦ 20 • written 5.3 years ago by aritra9060
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1789 users visited in the last hour