Question: R survival analysis SNPs loading data
gravatar for emily111
2.6 years ago by
emily1110 wrote:


I am trying to do a Kaplan Meier survival analysis with Cox proportional hazards in survival R package on a small number of SNPs. I have been struggling to find how/ in what form is best to load in my data. Can I load it in in a data frame with tidyverse? If so, has anyone got experience of how to code SNPs- I was thinking of coding them i.e. 1 = AA, 2 = AT, 3= TT, for instance.

Many thanks!

snp R gene • 1.5k views
ADD COMMENTlink modified 2.5 years ago by Biostar ♦♦ 20 • written 2.6 years ago by emily1110

Usual encoding is 0 for homozygous reference, 1 for heterozygous and 2 for homozygous variant. The actual nucleotide is often not important.

ADD REPLYlink written 2.6 years ago by WouterDeCoster43k

As per [edit] Wouter's comment, they are usually encoded 0, 1, 2 in the way that he describes. However, there is a key distinction:

  • encode as 0, 1, 2 and treat as a continuous variable
  • encode as 0, 1, 2 and treat as factors/categories

For survival, I presume that you are interested in survival by different genotype, so, you will have to ensure that you leave them as factors. If we were doing GWAS, though, we may leave it as a continuous variable and therefore see the additive effect of our genotype of interest.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Kevin Blighe56k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1703 users visited in the last hour