How To Convert Raw Gwas Data To Ped And Map File For Plink Analysis.
1
3
Entering edit mode
12.8 years ago
Young Ho Lee ▴ 30

I am trying to meta-analysis of GWAS data. But I have a big troble in managing the raw GWAS data. I can not convert the raw GWAS data which I downloaded from the site, http://www.ncbi.nlm.nih.gov/sites/entrez?db=gap, to ped and map file for PLINK analysis. The file name is like this, phs000202.pha002867.txt.

Could you tell me how to convert "phs000202.pha002867.txt" file to ped and map file (such as phs000202.pha002867.ped and phs000202.pha002867.map) for GWAS meta-analysis using PLINK.

Your early response would be greately appreciated.

Thank you.

Young Ho Lee, MD, PhD.

gwas • 17k views
ADD COMMENT
0
Entering edit mode

Can you post an example of the raw format? Are all downloads in the same format? I don't know off hand of a tool to do the conversion for you but I am guessing you can write your small script that should do it rather quickly!

ADD REPLY
0
Entering edit mode

Hi Young, I'm feeling altruistic and bored. If you give me good examples of the raw format and the format you need, I'll write you a perl script to convert if you need the help. marypaniscusATgmailDOTcom

ADD REPLY
0
Entering edit mode

Thank you for your comments. But, I do not know how to give you the raw data. Actually, that is just raw file dowonloaded from dbgap sits.

ADD REPLY
0
Entering edit mode

do you mean you want to download the whole database? That's certainly possible, but I don't know what ped and map files look like. I'll see if I can't poke at it a bit tomorrow and help you make sense of your issue.

ADD REPLY
1
Entering edit mode
12.8 years ago
Docroberson ▴ 310

PED and MAP files are just specific formats. If you have SNP data, for a basic map file all you need are chromosome, position, and SNPs.

http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#map

If you have your data in a spreadsheet like format what you'll need to do is put the columns in the order of chromosome, position and then the snps. There should be NO headers. The genetic distance is not necessary if you run plink with the --map3 option. All values have to be white space separated (either tab or space). I believe the examples show the alleles separated by white space, i.e. 'AA' as 'A A', but unless the code has changed I believe you can load it as 'AA' with no problem.

For the ped file you need family id, individual id, father id, mother id, sex and phenotype. The order of individuals in the rows of the ped file are the SAME ORDER as the columns of genotypes in the map file. If you don't have any families, or just don't know relationships, then you have to put in dummy values. Like this:

FAM001 SampleName 0 0 0 -9

The 0 for mother and father specify no mother and father. 0 (or number other than 1 and 2) means unknown sex. And the -9 means unknown phenotype. You SHOULD use -9 for phenotype. If you have real phenotypes, use those. Here is the ped format reference.

http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#ped

There are various options to indicate missing data and alternative formats, but you'll need to dig into the PLINK documentation to decide whether you need any of that. What I tend to do is to take a spreadsheet like file, remove the header and use the header (minus chromosome and position) to generate the dummy PED file if I don't have any additional information.

ADD COMMENT

Login before adding your answer.

Traffic: 1770 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6