Question: How to convert Haplotypes file to PLINK format data
0
gravatar for bha
5 months ago by
bha60
bha60 wrote:

I want to convert Haplotypes data to PLINK format (.map and .ped or binary .fam,.bim, .bed). I wonder, what is best software or R package, can do this easily? Has anybody came across this? Haplotype data is output file from HAPGEN2 (a programme which simulate the sequence data, but unfortunately, there is not a function in this to re-convert back to PLINK).

bioinformatics plink R • 321 views
ADD COMMENTlink written 5 months ago by bha60

is this the output file format: http://www.stats.ox.ac.uk/~marchini/software/gwas/file_format.html ?

ADD REPLYlink written 5 months ago by Gabriel R.2.5k

yes! do you have any sense to convert this to PLINK format?

ADD REPLYlink written 5 months ago by bha60

could you post a small subset of your file somewhere?

ADD REPLYlink written 5 months ago by Gabriel R.2.5k

The genotype file is exactly in the same format you mentioned in above link, and haplotype is 0s and 1s in standard file. The genytpe look like:

SNP1 rs1 1000 A C 1 0 0 1 0 0
SNP2 rs2 2000 G T 1 0 0 0 1 0
SNP3 rs3 3000 C T 1 0 0 0 1 0
SNP4 rs4 4000 C T 0 1 0 0 1 0
SNP5 rs5 5000 A G 0 1 0 0 0 1

So, at SNP3 the two alleles are C and T so the set of 3 probabilities for each indvidual correspond to the genotypes CC, CT and TT respectively.

Note : columns 2 and 3 (that contain the RS ID and base-pair position of the SNPs are set arbitrarily in this example.

ADD REPLYlink written 5 months ago by bha60

and you have a sample file as well? are these files simply zipped and not binary?

ADD REPLYlink written 5 months ago by Gabriel R.2.5k

Yes, i do have sample file as well. it's NOT zipped. here are the out files look like: http://mathgen.stats.ox.ac.uk/genetics_software/hapgen/hapgen2.html#top

ADD REPLYlink written 5 months ago by bha60

and I imagine these probabilities are not just 0 and 1 but can be 0.4 for example? But they have to sum up to 1 I suppose.

ADD REPLYlink written 5 months ago by Gabriel R.2.5k

these probabilities are 0s and 1s. And haplotypes are also in 0s and 1s. what do you suggest?

ADD REPLYlink written 5 months ago by bha60

but they could potentially be: 0.5 0.5 0? just to be sure.

ADD REPLYlink written 5 months ago by Gabriel R.2.5k

Yes, essentially they are. I have both genotypes, and haplotypes files. My main concern is to convert them to PLINK format, either haplotype or genotypes. Any idea please?

ADD REPLYlink written 5 months ago by bha60

I can code a module to import them using glactools and export in plink. We currently do not support this format but I could code it.

ADD REPLYlink written 5 months ago by Gabriel R.2.5k

Seems like that GTOOL can do this conversion: http://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html#formats

ADD REPLYlink written 5 months ago by bha60

did you try it? did it work?

ADD REPLYlink written 5 months ago by Gabriel R.2.5k
1

yes, i think it works well.

ADD REPLYlink written 5 months ago by bha60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1877 users visited in the last hour