Question: Snp Genotype Data
2
gravatar for Haluk
8.6 years ago by
Haluk170
Lincoln, Nebraska
Haluk170 wrote:

Hi,

I want to cluster HAPMAP project data using EIGENSTRAT. Currently, I have difficulties with creating genotype file. In the EIGENSTRAT manual, it says The genotype file contains 1 line per SNP. Each line contains 1 character per individual: 0 means zero copies of reference allele. 1 means one copy of reference allele. 2 means two copies of reference allele. 9 means missing data. In the following, it is one row of my huge data.

rs4475691 C/T chr1 836671 CT CC CC CC CC CT CC TT CC CC CC NN (and so on...)

1st column: snp id 2nd column: alleles 3rd column: chromosome 4th column: position

and the rest is patients genotype. I know NN is for missing data and it should be encoded as 9 according to EIGENSTRAT format, but I am not sure for CT, CC and TT.

Any help would be greatly appreciated.

snp genotyping • 6.9k views
ADD COMMENTlink modified 3.0 years ago by farid110ir0 • written 8.6 years ago by Haluk170
2
gravatar for Genotepes
8.6 years ago by
Genotepes950
Nantes (France)
Genotepes950 wrote:

Hi

not sure of what you exactly need. Do you need a code to turn this into Eigenstrat ?

If the question is how to recode CC, TT and CT, then you choose one allele as the reference - you could choose the most frequent for instance or, here, take the first allele in the your line - I think, I do not know what format it is ...

CC = 0 CT = 1 TT = 2

Christian

ADD COMMENTlink written 8.6 years ago by Genotepes950
1

No, I am not looking for code. I didn't get the idea of behind the encoding genotypes as 0,1 or 2. For instance, why did you set 1 to genotype CT?

ADD REPLYlink written 8.6 years ago by Haluk170
1

Yes? CT is set 1.

basically, 0 1 and 2 are the number of non-reference allele (something chosen arbitrary - could be the other allele) in the genotype. The idea is to create a "quantitative" trait for each SNP and apply a PCA-based analysis.

Hanif : you are right and I am sadly wrong. Was a "typo" in the sense that C is the reference allele.

Sorry about that - I am going to to vote a -1 for my message. On the other side, for the PCA and clustering here, I'd tend to say that the order is not so important - but better be straigth and do things

Christian

ADD REPLYlink written 8.6 years ago by Genotepes950

Actually since it's a C/T SNP, CC = 2, CT = 1, TT = 0, NN = 9

ADD REPLYlink written 8.6 years ago by Hanif Khalak1.2k
2
gravatar for Larry_Parnell
8.6 years ago by
Larry_Parnell16k
Boston, MA USA
Larry_Parnell16k wrote:

The answer from genotepes is fine. Hanif's comment is also OK, but we don't really know from the limited info which is the true reference allele and which is the derived.

In the case of EIGENSTRAT, any heterozygous genotype will be coded by 1 because it has one copy of the reference allele - and one copy of the derived.

ADD COMMENTlink written 8.6 years ago by Larry_Parnell16k

Actually I checked on UCSC and found C is .. But again I am not sure all the formats will be reference/alternative.

In most of the imputation-oriented formats I meet, as far as I remember, it is the cas.

ADD REPLYlink written 8.6 years ago by Genotepes950

Right, it should be that way. Sometimes, though, the alleles are simply listed alphabetically.

ADD REPLYlink written 8.6 years ago by Larry_Parnell16k
0
gravatar for farid110ir
3.0 years ago by
farid110ir0
farid110ir0 wrote:

Hello, Maybe my question is simple or silly question, but I need to ask you how should identify genotype of individuals. Actually, I am doing an association study, for that I have got sequence of each individual related to my desired gene. I have done SNP analysis and now I don’t know the next step in order to genotyping. Could you please assist me?

ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by farid110ir0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1262 users visited in the last hour