I am working on a genome-wide association study. I wish to examine the time to onset of a disease in relation to the alleles present at each SNP (survival analysis). I have done this for a few snps using STATA, but this won't work for a whole genome dataset so I am trying to do it in R.
My files are in PLINK format, and I have included the time to event data in the fam section of the read in plink data using the merge command.
My script is below:
#Read in plink files GWAS<-read.plink("R.bed","R.bim","R.fam") pheno <- read.delim("C:/Users/USER/Desktop/pheno.txt") view(pheno) library("GenABEL", lib.loc="~/R/win-library/3.1") #find columns head(GWAS) #merge plink file with followup and presence of disease GWAS$fam <-merge(GWAS$fam,pheno, by.x="member", by.y= "Chip") #this is now included in the plink files #running in cox model using GenABEL package and mlreg command coxm <- mlreg(GWAS((fam$followup),(map$allele.1)),"fam$disease")
However, it's not working, I'm pretty new at this and I'm not sure how to specify the genotype at each snp for each individual. Has anyone ran a genome-wide cox model in R? If so help would be greatly appreciated!
Could you post any error messages that came up. My student successfully did this kind of analysis, but I'd have to find her scripts to see where the differences lie.
The error message comes up as:
coxm <- mlreg(GWAS((fam$followup),(map$allele.1)),"fam$skin")
Error in mlreg(GWAS((fam$followup), (map$allele.1)), "fam$skin") :
data argument should have gwaa.data class
Up to that point works fine. I have also tried:
#GWAS cox modelling
setwd("C:/Users/insight/Desktop")
library("snpStats", lib.loc="~/R/win-library/3.1")
#Read in plink files
GWAS<-read.plink("Rel2-0.bed","Rel2-0.bim","Rel2-0.fam")
pheno <- read.delim("pheno.txt")
view(pheno)
library("GenABEL", lib.loc="~/R/win-library/3.1")
#find columns
head(GWAS)
#merge plink file with followup and skin cancer
GWAS$fam <-merge(GWAS$fam,pheno, by.x="member", by.y= "Chip")
#this is now included in the plink files
#Below using survival package
srenalTime<-with(GWAS, Surv(fam$followup,fam$skin))
srenalFit<-survfit(srenalTime~GWAS$genotypes)
#Get error message here:
Error in model.frame.default(formula = srenalTime ~ GWAS$genotypes) :
variable lengths differ (found for 'GWAS$genotypes')
By any chance, does you data contain NAs?
Yes, actually I think it might. Is there a way to remove these?
Try this,
This will remove all the rows that have at least one NA.
I tried this but it removed the majority of my rows. Is there anyway of replacing the NA's with a value or just allowing for missingness?
Caragh I don't think so