Question

Survival analysis of TCGA data

1

Entering edit mode

2.3 years ago

seta ★ 1.9k

Dear all,

I have got several combined gene expression microarray datasets (output of meta-analysis) of about 400 cancer samples with different cancer subtypes. Using WGCNA analysis, I found some gene modules significantly correlated (r2 about 20%-80%) with the basal subtype. I downloaded two independent datasets (one microarray and another RNA-seq data of TCGA) for doing preservation analysis of the correlated modules in basal subtype samples. The preservation analysis within WGCNA was also showed these modules are conserved in these two datasets. I obtained the hub genes (150 genes) of these conserved modules via WGCNA (GS > 0.2 & MM > 0.8). Finally, I examined the probable effect of these hub genes on the survival rate of individuals with the basal subtype within the above two independent datasets. While I got 15 genes significantly (adjusted p-value < 0.05) that have an effect on survival rate in the microarray dataset, the same results (genes) have not been obtained by analyzing the TCGA dataset. As it’s my first experience with TCGA data and survival analysis, I shared the used codes. Could you please take a look at the codes and kindly tell me if I made any mistakes anywhere?

I downloaded TCGA data from here, and used this tutorial and Regparallele to survival analysis of hub genes.

rna <-read.table("Human__TCGA_BRCA__UNC__RNAseq__HiSeq_RNA__01_28_2016__BI__Gene__Firehose_RSEM_log2.txt", header=T,sep="\t", row.names=1)
table(substr(colnames(rna),14,15))
      01 
    1082
# removing genes with no-expression in more 50% of samples:
fif <- dim(rna)[2]/2
no_exp <- data.frame(count = apply(rna, 1, function(x) length(which(x== 0))))
no_exp$gene <- row.names(no_exp)
no_exp <- no_exp[no_exp$count > fif, ]$gene
rna <- rna[- which(row.names(rna) %in% no_exp), ]

# Data normalization with voom
x <- t(apply(rna, 1, as.numeric))
rna_vm <- voom(x)$E
colnames(rna_vm) <- gsub("\\.","-",substr(colnames(rna),1,12))

#getting z-core from normalized data
rna_vm <- t(scale(t(rna_vm)))

Then I subset the genes of interest (hub genes) and individuals with basal subtype and used RegParallel for doing survival analysis as Kevin kindly explained here.

x<- read.table("subset.txt", header=T, row.names=1)
metadata<-read.table("clinic.txt", header=T)
all((colnames(x) == rownames(metadata)) == TRUE)
[1] TRUE
coxdata <- data.frame(metadata, t(x))
coxdata$dfs <- as.numeric(coxdata$dfs) 
coxdata$time <- as.numeric(coxdata$time))
res <- RegParallel(
    data = coxdata,
    formula = 'Surv(time, dfs) ~ [*] + age + stage',     
    FUN = function(formula, data)
      coxph(formula = formula,
        data = data,
        ties = 'breslow',
        singular.ok = TRUE),
    FUNtype = 'coxph',
    variables = colnames(coxdata)[5:ncol(coxdata)],
    blocksize = 130,                                    
    cores = 2,
    nestedParallel = FALSE,
    conflevel = 95,
    p.adjust = "BH")

Could you please let me know if there is any problem with the above code?

Thanks!

survival analysis TCGA Regparallele WGCNA • 879 views

ADD COMMENT • link 2.3 years ago by seta ★ 1.9k