Question: R survival analysis SNPs loading data
0
gravatar for emily111
4 months ago by
emily1110
emily1110 wrote:

Hello!

I am trying to do a Kaplan Meier survival analysis with Cox proportional hazards in survival R package on a small number of SNPs. I have been struggling to find how/ in what form is best to load in my data. Can I load it in in a data frame with tidyverse? If so, has anyone got experience of how to code SNPs- I was thinking of coding them i.e. 1 = AA, 2 = AT, 3= TT, for instance.

Many thanks!

snp R gene • 402 views
ADD COMMENTlink modified 8 weeks ago by Biostar ♦♦ 20 • written 4 months ago by emily1110

Usual encoding is 0 for homozygous reference, 1 for heterozygous and 2 for homozygous variant. The actual nucleotide is often not important.

ADD REPLYlink written 4 months ago by WouterDeCoster24k
1

As per [edit] Wouter's comment, they are usually encoded 0, 1, 2 in the way that he describes. However, there is a key distinction:

  • encode as 0, 1, 2 and treat as a continuous variable
  • encode as 0, 1, 2 and treat as factors/categories

For survival, I presume that you are interested in survival by different genotype, so, you will have to ensure that you leave them as factors. If we were doing GWAS, though, we may leave it as a continuous variable and therefore see the additive effect of our genotype of interest.

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by Kevin Blighe9.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 968 users visited in the last hour