Question: How To Convert Raw Gwas Data To Ped And Map File For Plink Analysis.
gravatar for Young Ho Lee
9.6 years ago by
Young Ho Lee30
Young Ho Lee30 wrote:

I am trying to meta-analysis of GWAS data. But I have a big troble in managing the raw GWAS data. I can not convert the raw GWAS data which I downloaded from the site,, to ped and map file for PLINK analysis. The file name is like this, phs000202.pha002867.txt.

Could you tell me how to convert "phs000202.pha002867.txt" file to ped and map file (such as phs000202.pha002867.ped and for GWAS meta-analysis using PLINK.

Your early response would be greately appreciated.

Thank you.

Young Ho Lee, MD, PhD.

gwas • 15k views
ADD COMMENTlink written 9.6 years ago by Young Ho Lee30

Can you post an example of the raw format? Are all downloads in the same format? I don't know off hand of a tool to do the conversion for you but I am guessing you can write your small script that should do it rather quickly!

ADD REPLYlink written 9.6 years ago by Darren J. Fitzpatrick1.1k

Hi Young, I'm feeling altruistic and bored. If you give me good examples of the raw format and the format you need, I'll write you a perl script to convert if you need the help. marypaniscusATgmailDOTcom

ADD REPLYlink written 9.6 years ago by Mary Paniscus10

Thank you for your comments. But, I do not know how to give you the raw data. Actually, that is just raw file dowonloaded from dbgap sits.

ADD REPLYlink written 9.6 years ago by Young Ho Lee30

do you mean you want to download the whole database? That's certainly possible, but I don't know what ped and map files look like. I'll see if I can't poke at it a bit tomorrow and help you make sense of your issue.

ADD REPLYlink written 9.6 years ago by Mary Paniscus10
gravatar for Docroberson
9.6 years ago by
the lab
Docroberson300 wrote:

PED and MAP files are just specific formats. If you have SNP data, for a basic map file all you need are chromosome, position, and SNPs.

If you have your data in a spreadsheet like format what you'll need to do is put the columns in the order of chromosome, position and then the snps. There should be NO headers. The genetic distance is not necessary if you run plink with the --map3 option. All values have to be white space separated (either tab or space). I believe the examples show the alleles separated by white space, i.e. 'AA' as 'A A', but unless the code has changed I believe you can load it as 'AA' with no problem.

For the ped file you need family id, individual id, father id, mother id, sex and phenotype. The order of individuals in the rows of the ped file are the SAME ORDER as the columns of genotypes in the map file. If you don't have any families, or just don't know relationships, then you have to put in dummy values. Like this:

FAM001 SampleName 0 0 0 -9

The 0 for mother and father specify no mother and father. 0 (or number other than 1 and 2) means unknown sex. And the -9 means unknown phenotype. You SHOULD use -9 for phenotype. If you have real phenotypes, use those. Here is the ped format reference.

There are various options to indicate missing data and alternative formats, but you'll need to dig into the PLINK documentation to decide whether you need any of that. What I tend to do is to take a spreadsheet like file, remove the header and use the header (minus chromosome and position) to generate the dummy PED file if I don't have any additional information.

ADD COMMENTlink modified 17 months ago by Ram32k • written 9.6 years ago by Docroberson300
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2320 users visited in the last hour