Question: Genome-wide cox regression in R
2
gravatar for Caragh
4.6 years ago by
Caragh 40
Ireland
Caragh 40 wrote:

I am working on a genome-wide association study. I wish to examine the time to onset of a disease in relation to the alleles present at each SNP (survival analysis). I have done this for a few snps using STATA, but this won't work for a whole genome dataset so I am trying to do it in R. 

My files are in PLINK format, and I have included the time to event data in the fam section of the read in plink data using the merge command.

 

My script is below: 

#Read in plink files
GWAS<-read.plink("R.bed","R.bim","R.fam")
pheno <- read.delim("C:/Users/USER/Desktop/pheno.txt")
view(pheno)

library("GenABEL", lib.loc="~/R/win-library/3.1")

#find columns 
head(GWAS)

#merge plink file with followup and presence of disease
GWAS$fam <-merge(GWAS$fam,pheno, by.x="member", by.y= "Chip")

#this is now included in the plink files
#running in cox model using GenABEL package and mlreg command
coxm <- mlreg(GWAS((fam$followup),(map$allele.1)),"fam$disease")

 

However, it's not working, I'm pretty new at this and I'm not sure how to specify the genotype at each snp for each individual. Has anyone ran a genome-wide cox model in R? If so help would be greatly appreciated!

 

ADD COMMENTlink modified 3.3 years ago by ImanMeZ0 • written 4.6 years ago by Caragh 40

Could you post any error messages that came up. My student successfully did this kind of analysis, but I'd have to find her scripts to see where the differences lie.

ADD REPLYlink written 4.6 years ago by russhh4.4k

The error message comes up as:

coxm <- mlreg(GWAS((fam$followup),(map$allele.1)),"fam$skin")
Error in mlreg(GWAS((fam$followup), (map$allele.1)), "fam$skin") :
  data argument should have gwaa.data class

 

Up to that point works fine. I have also tried: 

 

#GWAS cox modelling

setwd("C:/Users/insight/Desktop")

library("snpStats", lib.loc="~/R/win-library/3.1")

#Read in plink files

GWAS<-read.plink("Rel2-0.bed","Rel2-0.bim","Rel2-0.fam")

pheno <- read.delim("pheno.txt")

view(pheno)

library("GenABEL", lib.loc="~/R/win-library/3.1")


#find columns 
head(GWAS)


#merge plink file with followup and skin cancer

GWAS$fam <-merge(GWAS$fam,pheno, by.x="member", by.y= "Chip")

#this is now included in the plink files

 

#Below using survival package

srenalTime<-with(GWAS, Surv(fam$followup,fam$skin)) 


srenalFit<-survfit(srenalTime~GWAS$genotypes)

#Get error message here: Error in model.frame.default(formula = srenalTime ~ GWAS$genotypes) : 
  variable lengths differ (found for 'GWAS$genotypes')

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by Caragh 40
1

By any chance, does you data contain NAs?

ADD REPLYlink written 4.6 years ago by komal.rathi3.4k

Yes, actually I think it might. Is there a way to remove these?

ADD REPLYlink written 4.6 years ago by Caragh 40

Try this,

data = data[complete.cases(data),]

This will remove all the rows that have at least one NA.

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by komal.rathi3.4k

I tried this but it removed the majority of my rows. Is there anyway of replacing the NA's with a value or just allowing for missingness?

ADD REPLYlink written 4.6 years ago by Caragh 40

Caragh  I don't think so

ADD REPLYlink written 4.6 years ago by komal.rathi3.4k
3
gravatar for Caragh
4.6 years ago by
Caragh 40
Ireland
Caragh 40 wrote:

I figured it out, thank's everyone for the help. Thought I'd post up the script I used in case anyone else has similar issues. 

What my problem was, was that I had the data in the wrong format, so I converted my bed, bim and fam files to ped and map using --recode in plink. The files could then be read into R and converted again to the gwaa.data class. Once it was read in the rest was straight forward.

#Load package

source("http://bioconductor.org/biocLite.R")
biocLite('GenABEL')


#open package

library('GenABEL')

#set working directory

setwd("C:/where_my_files_are")


#Convert to gwaa.data class

convert.snp.ped(ped='Rel2-0.ped',mapfile='Rel2-0.map', out='rel2-0.out')

survivaldata <- load.gwaa.data(phenofile = "pheno.txt", genofile = "rel2-0.out")


#run quality control

qc1 <- check.marker(survivaldata)


qcGWAS <- survivaldata[qc1$idok, qc1$snpok]


#Run cox proportional hazards model
coxresults <- mlreg(GASurv(survivaldata@phdata$followuptime, survivaldata@phdata$event)~survivaldata@phdata$age, data=qcGWAS)


#View top 20 results
descriptives.scan(results, top=20)
ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by Caragh 40
0
gravatar for ImanMeZ
3.3 years ago by
ImanMeZ0
Germany
ImanMeZ0 wrote:

I was wondering what's the format of the pheno file "pheno.txt"

as I am trying to upload mine and I get this error message 

Error in load.gwaa.data(pheno = "pheno.txt", geno = "GWA.out") :
  the filed named "id", containing the identifier presented in both pheno- and geno- files was not found in the phenofile

My pheno file format 

str(Pheno)

 $ id          : chr  "ID1" "ID2" "ID3" ....
 $ Sex         : int  1 1 1 1 2 2 1 2 1 2 ...
 $ Status      : int  0 1 0 0 0 1 0 0 1 0 ...
 $ Obs         : chr  "71.8" "71.8" "52.2" "0.3" ...
 $ clinicalobs1      : chr  "1" "1" "1" "0" ...
 $ clinicalobs2     : int  1 0 0 0 1 1 1 0 0 0 ...

 

Thanks :) 

 

ADD COMMENTlink written 3.3 years ago by ImanMeZ0

Hi,here's my pheno file structure. Hope this will help!

    > str(survivaldata@phdata)
'data.frame':   142 obs. of  5 variables:
 $ id      : chr  "TB5943" "TB5459" "TB1000231" "62" ...
 $ month_os: num  58.2 59.8 36.2 35.2 30 ...
 $ status  : int  0 0 0 0 1 0 1 0 1 0 ...
 $ sex     : int  0 0 0 0 0 0 0 0 0 0 ...
 $ age     : int  45 60 47 61 79 45 72 50 63 45 ...
ADD REPLYlink written 2.5 years ago by ShirleyDai40

Hi ImanMez, I am wondering if your problems solved here. I now encounter the same problem with you that I kept get "the filed named "id", containing the identifier presented in both pheno- and geno- files was not found in the phenofile" while i do have it. My pheno file:

str(a)

'data.frame': 209 obs. of 14 variables:

$ id : int 2 3 4 5 6 7 8 9 10 11 ...

$ age : int 59 74 61 29 59 32 54 53 41 70 ...

$ sex : int 1 1 1 0 0 0 0 1 1 1 ...

$ CTCAE : int 3 3 1 1 2 1 1 1 0 1 ...

Thanks in advance!

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by dxldw0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1052 users visited in the last hour