Question: What is the relationship between PLink ped files and tped files
3
gravatar for haohanw
3.7 years ago by
haohanw90
United States
haohanw90 wrote:

I wonder what is the relationship between Plink .tped and .ped files. From what I observe, it seems it is more complicated than a simple transpose. 

For example, in Section 4.1.1 of this manual, there is an example as following:

1 1 0 0 1 1 1 1 G G
1 2 0 0 2 1 0 0 A G
1 3 0 0 1 1 1 1 A G
1 4 0 0 2 1 2 1 A A

is transposed as

1 snp1 0 10001 1 1 0 0 1 1 2 1
1 snp2 0 20001 G G G A G A A A

but instead of, what I thought should be:

1 snp1 0 10001 1 1 0 0 1 1 2 1
1 snp2 0 20001 G G A G A G A A

Why there is a reverse relationship here?

And I think this reverse is not guaranteed to happen, for the reasons that in example of Section 3.4 of the same manual, it's hard to tell if there is any pattern for whether should be reversed or not. 

(I am quite new to this area, and I hope the reason is not something very superficial as common sense in this domain)

snp plink gwas • 2.1k views
ADD COMMENTlink modified 3.7 years ago by Philipp Bayer6.4k • written 3.7 years ago by haohanw90
3
gravatar for Philipp Bayer
3.7 years ago by
Philipp Bayer6.4k
Australia/Perth/UWA
Philipp Bayer6.4k wrote:

Interesting, I didn't know about that! Could it be that PLINK internally just sorts the alleles using some arbitrary rules?
I just ran a test with input alleles "G A", "A G" in various combinations with other SNPs and they always came out as "G A" in the transposed dataset.

Similarly, "G T", "T G" always becomes "G T", "G C", "C G" always becomes "G C" etc. "A T"/"T A" is always "A T", "A C"/C A" becomes "A C", "G C"/"C G" becomes "G C". It can't be alphabetically sorted for obvious reasons.

 

The funny thing is, if I repeat the same thing using PLINK2, I get alphabetically sorted alleles: your example becomes G G  A G  A G  A A (and my test-cases become alphabetically sorted, too). That makes me think that it's rather arbitrary and doesn't particularly matter.

 

Edit: I think it has to do with the way PLINK 1.07 stores genotypes as numbers - if you run

    plink --file mytest --recode --transpose

you get the above inconsistent behaviour, but if you run

    plink --file mytest --recode12 --transpose

so that all genotypes become numerically recoded, you'll always see "1 2" for all test cases, so these genotypes seem to be not alphabetically, but numerically sorted!

ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by Philipp Bayer6.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1173 users visited in the last hour