Converting a big SNP genotype table to plink (.ped) file for ADMIXTURE
1
1
Entering edit mode
6.9 years ago
biomonte ▴ 220

Dear everyone!

I have a big SNP genotype table (about 6 million SNPs) with the following format:

SNP_ID    ind_1    ind_1    ind_2    ind_2    ind_3    ind_3
snp_1     A        A         A       A        G        A
snp_2     T        T         T       T        T        T
snp_3     A        G         G       A        G        G
...

Where each individual (ind) has two alleles (ind_1 & ind_1).

How can I convert this table to a plink (.ped) file? My aim is to use this file as input for ADMIXTURE; I am aware that I also have to create a (.map) file containing info on the SNPs, but I'm also not sure how to do it. Any tips? Additionally, I would like to cluster ind1 and ind2 together, but not ind3. A Perl-based script would be highly appreciated.

Here the parameters for the plink (.ped) file:

1) Family_ID=?
2) Individual_ID=?
3) Paternal_ID=0
4) Maternal_ID=0
5) Sex=2
6) Phenotype=1

Any help is very welcomed; thank you so much! ☺

plink SNP PED format admixture perl • 2.7k views
ADD COMMENT
1
Entering edit mode
6.9 years ago

Your table looks non-standard to me so I don't think there's a publicly available parser, but I might be wrong.

Here are some rules to get you going with your own parser, you'll have to transpose that table and add a few columns like you have them there.

  • Only the combination of family and individual ID has to be individual,so you can set one of them to missing (-9 in plink). the other can be ind_1, ind_2, ind_3 etc. You can also set both to the same thing (if your individuals aren't related)

  • ADMIXTURE can take sex chromosomes into account , page 10 in the manual

  • phenotype shouldn't be 1 for everybody, but I guess ADMIXTURE doesn't care.

  • the map file has the chromosome, the snp name, and the position in bp and cM (cM can be 0), see this link. The map order and ped order has to be identical since the ped file doesn't store the SNP names. You'll need two columns in the ped file for each row in your map file, these two columns store the two bases of the SNP.

ADD COMMENT

Login before adding your answer.

Traffic: 1879 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6