R survival analysis SNPs loading data
0
0
Entering edit mode
6.7 years ago
emily111 • 0

Hello!

I am trying to do a Kaplan Meier survival analysis with Cox proportional hazards in survival R package on a small number of SNPs. I have been struggling to find how/ in what form is best to load in my data. Can I load it in in a data frame with tidyverse? If so, has anyone got experience of how to code SNPs- I was thinking of coding them i.e. 1 = AA, 2 = AT, 3= TT, for instance.

Many thanks!

R gene snp • 2.4k views
ADD COMMENT
0
Entering edit mode

Usual encoding is 0 for homozygous reference, 1 for heterozygous and 2 for homozygous variant. The actual nucleotide is often not important.

ADD REPLY
1
Entering edit mode

As per [edit] Wouter's comment, they are usually encoded 0, 1, 2 in the way that he describes. However, there is a key distinction:

  • encode as 0, 1, 2 and treat as a continuous variable
  • encode as 0, 1, 2 and treat as factors/categories

For survival, I presume that you are interested in survival by different genotype, so, you will have to ensure that you leave them as factors. If we were doing GWAS, though, we may leave it as a continuous variable and therefore see the additive effect of our genotype of interest.

ADD REPLY

Login before adding your answer.

Traffic: 1789 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6