Question: SPIA results show FDR=1 for RNA-Seq differential expression data downloaded from TCGA
0
gravatar for asnanizami
2.3 years ago by
asnanizami0
asnanizami0 wrote:

I am a graduate student from India. I have been using tool, SPIA on my lung cancer data downloaded from TCGA to analyze pathway perturbation. It is a RNA- Seq FPKM normalized data. I have further applied between array normalization using EdgeR bioconductor package and used limma to obtain logFC and P-value. The problem is when I run the "spia" function, the results that I get have FDR=1 for all pathways, pSize and pNDE values are equal in each pathway. Below is the codes I've used for limma:

args = commandArgs(TRUE)
control_file = "CONTROL.csv";
disease_file = "LUAD1.csv";
DEfile = "DELUAD1.tsv";

suppressMessages(library(limma));

options(warn=-1)

DE <- function(data1, data2)
{
  merged12 = merge(data1,data2,by="row.names")
  rn = merged12[,1]
  rownames(merged12) = rn;
  Eset = ExpressionSet(assayData=as.matrix(merged12[-1]))
  ID <- featureNames(Eset)

  #Symbol <- getSYMBOL(ID, "org.Hs.eg.db")
  tmp <- data.frame(ID=ID, Symbol=ID, stringsAsFactors = F)
  # Clean up
  tmp[tmp=="NA"] <- NA
  fData(Eset) <- tmp
  # Clean up
  fvarLabels(Eset) <- make.names(fvarLabels(Eset))
  sml=NULL
  for(i in 1:dim(data1)[[2]]) { sml = append(sml,"G0")}
  for(i in 1:dim(data2)[[2]]) { sml = append(sml,"G1")}
  ex <- exprs(Eset)
  # set up the data and proceed with analysis
  fl <- as.factor(sml)
  Eset$description <- fl
  design <- model.matrix(~ description + 0, Eset)
  colnames(design) <- levels(fl)
  fit <- lmFit(Eset, design)
  cont.matrix <- makeContrasts(G1-G0, levels=design)
  fit2 <- contrasts.fit(fit, cont.matrix)
  fit2 <- eBayes(fit2, 0.01)
  gene_list <- topTable(fit2, coef=1, number=1000000, sort.by="logFC")
  write.table (gene_list, file = DEfile, sep = "\t", row.names=F, quote=F);
  return (gene_list)
}

df1= as.data.frame(read.table(file="CONTROL.csv", sep = " ", header=T, row.names=1))
df2= as.data.frame(read.csv(file="LUAD1.csv", sep = ",", header=T,row.names=1))


message("Removing non-variable genes in each dataset...")
library(genefilter)
X = transform(as.matrix(df1), SD=rowSds(as.matrix(df1), na.rm=TRUE)); ## GETTING THE SD OF EACH ROW I.E. GENE
names = rownames(df1); ## GETTING THE IDS OF ALL GENES 
SAMPLE1 = X[names[X$SD > 0.000001],]; ## GETTING THE GENEIDS AND THEIR EXPRESSION VALUES IN ALL SAMPLES IF THE SD OF EXPRESSION VALUES IS > 0
SAMPLE1$SD <- NULL
r1  = rownames(SAMPLE1);
r1 = rownames(df1)

X = transform(as.matrix(df2), SD=rowSds(as.matrix(df2), na.rm=TRUE)); ## GETTING THE SD OF EACH ROW I.E. GENE
names = rownames(df2); ## GETTING THE IDS OF ALL GENES 
SAMPLE2 = X[names[X$SD > 0.000001],]; ## GETTING THE GENEIDS AND THEIR EXPRESSION VALUES IN ALL SAMPLES IF THE SD OF EXPRESSION VALUES IS > 0
SAMPLE2$SD <- NULL
r2  = rownames(SAMPLE2);

genes = intersect(r1,r2); ## getting the list of common genes present in both the datasets 

S1 = SAMPLE1[genes,]; 
S2 = SAMPLE2[genes,]; 

library(Biobase)
DELUAD1 = DE(S2,S1)

and the code for SPIA is:

library(SPIA)
options(digits=3)
head(DELUAD1)

tg1<-DELUAD1[DELUAD1$adj.P.Val<0.1,]
DE_LUADI=tg1$logFC
names(DE_LUADI)<-as.vector(tg1$ID)
ALL_LUADI=tg1$ID

DE_LUADI[1:10]
ALL_LUADI[1:10]

# pathway analysis based on combined evidence; # use nB=2000 or more for more accurate results
res=spia(de=DE_LUADI,all=ALL_LUADI,organism="hsa",nB=2000,plots=T,beta=NULL,combine="fisher",verbose=FALSE)

#show first 15 pathways, omit KEGG links
res[1:20,-12]

The results looks like this:

                                         Name    ID pSize NDE pNDE      tA pPERT     pG pGFdr pGFWER    Status
1                       Olfactory transduction 04740   290 290    1 -1710.8 0.002 0.0144     1      1 Inhibited
2                         Serotonergic synapse 04726    71  71    1  -114.1 0.009 0.0514     1      1 Inhibited
3          Antigen processing and presentation 04612    49  49    1   609.8 0.011 0.0606     1      1 Activated
4                 Systemic lupus erythematosus 05322    95  95    1   458.3 0.013 0.0695     1      1 Activated
5              Staphylococcus aureus infection 05150    36  36    1   833.3 0.024 0.1135     1      1 Activated
6    Natural killer cell mediated cytotoxicity 04650    74  74    1  -736.4 0.032 0.1421     1      1 Inhibited
7                                Legionellosis 05134    32  32    1   336.6 0.035 0.1523     1      1 Activated
8      Progesterone-mediated oocyte maturation 04914    48  48    1   344.0 0.037 0.1590     1      1 Activated
9                       Fanconi anemia pathway 03460    30  30    1   -12.8 0.038 0.1623     1      1 Inhibited
10                                   Pertussis 05133    46  46    1   368.6 0.039 0.1655     1      1 Activated

Please help as to where I am going wrong. It'll be a great help. I have been struggling on it from a very long time.

ADD COMMENTlink modified 2.3 years ago by theobroma221.1k • written 2.3 years ago by asnanizami0
2
gravatar for theobroma22
2.3 years ago by
theobroma221.1k
theobroma221.1k wrote:

In the spia function de and all should not be the same size since de is a very small subset of all. So,

ALL_LAUDI = DELAUD1$ID.
ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by theobroma221.1k

It solved the error. Thank you!

ADD REPLYlink written 2.3 years ago by asnanizami0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1538 users visited in the last hour