Question: R survival analysis SNPs loading data
0
gravatar for emily111
9 weeks ago by
emily1110
emily1110 wrote:

Hello!

I am trying to do a Kaplan Meier survival analysis with Cox proportional hazards in survival R package on a small number of SNPs. I have been struggling to find how/ in what form is best to load in my data. Can I load it in in a data frame with tidyverse? If so, has anyone got experience of how to code SNPs- I was thinking of coding them i.e. 1 = AA, 2 = AT, 3= TT, for instance.

Many thanks!

snp R gene • 299 views
ADD COMMENTlink modified 3 days ago by Biostar ♦♦ 20 • written 9 weeks ago by emily1110

Usual encoding is 0 for homozygous reference, 1 for heterozygous and 2 for homozygous variant. The actual nucleotide is often not important.

ADD REPLYlink written 9 weeks ago by WouterDeCoster22k
1

As per [edit] Wouter's comment, they are usually encoded 0, 1, 2 in the way that he describes. However, there is a key distinction:

  • encode as 0, 1, 2 and treat as a continuous variable
  • encode as 0, 1, 2 and treat as factors/categories

For survival, I presume that you are interested in survival by different genotype, so, you will have to ensure that you leave them as factors. If we were doing GWAS, though, we may leave it as a continuous variable and therefore see the additive effect of our genotype of interest.

ADD REPLYlink modified 1 day ago • written 2 days ago by Kevin Blighe3.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1308 users visited in the last hour