Question: Convert SNP dataset from PACKEDANCESTRYMAP to plink (PED)
0
gravatar for BlastedBadger
18 months ago by
Thule
BlastedBadger70 wrote:

I have downloaded the "Affymetrix Human Origins Curated" dataset from David's Reich Lab, but am totally at loss to understand which format it is, and how I could convert it to something usable by plink

So far, I have downloaded the utility convertf, from the AdmixTools package for example. Based on the convertf README, I assumed that the .geno, .snp and .ind files are in "PACKEDANCESTRYMAP" format.

I attempted to convert them to PED using the following "parfile" for convertf:

genotypename:    panel1.geno
snpname:         panel1.snp
indivname:       panel1.ind
outputformat:    PED
genotypeoutname: panel1-PED.ped
snpoutname:      panel1-PED.map
indivoutname:    panel1-PED.pedind

Then convertf -p parfile seems to work, but the output format is not accepted by plink!

I tried this command to test:

 plink1 --no-web --file panel1-PED.ped --make-bed --out panel1-BED

And it failed like this:

@----------------------------------------------------------@
|        PLINK!       |     v1.07      |   10/Aug/2009     |
|----------------------------------------------------------|
|  (C) 2009 Shaun Purcell, GNU General Public License, v2  |
|----------------------------------------------------------|
|  For documentation, citation & bug-report instructions:  |
|        http://pngu.mgh.harvard.edu/purcell/plink/        |
@----------------------------------------------------------@

Skipping web check... [ --noweb ]
Writing this text to log file [ panel1-BED.log ]
Analysis started: Fri Oct 13 11:51:03 2017

Options in effect:
        --noweb
        --ped panel1-PED.ped
        --map panel1-PED.map
        --make-bed
        --out panel1-BED


ERROR: Problem with MAP file line:
1  Affx-4964829     0.013491      1349123 A G

So the map file is incorrectly formatted, it has these 2 extra unwanted columns at the end.

My question is: why doesn't convertf output a correct map format? And is it safe to remove these two last columns using awk or sed (I did it, and plink seemed to make the conversion)?

snp plink • 934 views
ADD COMMENTlink written 18 months ago by BlastedBadger70

The convertf from AdmixTools is really not working as it should. For example, the .fam file produced by using "PACKEDPED" output does not contain any population information anymore. The first column (family IDs) is just a row number... I am gonna try with Eigensoft.

ADD REPLYlink written 18 months ago by BlastedBadger70

Alright, convertf from Eigentools is doing the same

ADD REPLYlink written 18 months ago by BlastedBadger70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1356 users visited in the last hour