Entering edit mode
6.7 years ago
emily111
•
0
Hello!
I am trying to do a Kaplan Meier survival analysis with Cox proportional hazards in survival R package on a small number of SNPs. I have been struggling to find how/ in what form is best to load in my data. Can I load it in in a data frame with tidyverse? If so, has anyone got experience of how to code SNPs- I was thinking of coding them i.e. 1 = AA, 2 = AT, 3= TT, for instance.
Many thanks!
Usual encoding is 0 for homozygous reference, 1 for heterozygous and 2 for homozygous variant. The actual nucleotide is often not important.
As per [edit] Wouter's comment, they are usually encoded 0, 1, 2 in the way that he describes. However, there is a key distinction:
For survival, I presume that you are interested in survival by different genotype, so, you will have to ensure that you leave them as factors. If we were doing GWAS, though, we may leave it as a continuous variable and therefore see the additive effect of our genotype of interest.