Question: Difficulty in analysis of Differential gene expression of Firehose data
0
gravatar for lawarde.ankita1
17 months ago by
lawarde.ankita130 wrote:

Dear Sir/madam,

I am working on LUAD data of gene expression, this data is RNA-SeqV2 RSEM normalized data. I want to find out the DEG between Tumor stage I and Stage IV. But im facing some problems, I'm using EBSeq package to find the DE genes, Below is the R code i have used (followed according to the EBSeq vignette)

rna <- read.table("RNA/LUAD_RNA_seq.txt",nrows = 20533, header = T, row.names = 1, sep = "\t")
#removing zero expressions
rem <- function(x){
  x <- as.matrix(x)
  x<- t(apply(x,1,as.numeric))
  r <- as.numeric(apply(x,1,function(i)sum(i == 0)))
  remove <- which(r > dim(x)[2]*0.5) 
  return(remove)
}
remove <- rem(rna)
rna <- rna[-remove,]
dim(rna)

x <- t(apply(rna,1,as.numeric))
dim(x)

#removing probes without symbol
row.has.na <- apply(x, 1, function(x){anyis.na(x))})
sumrow.has.na)

x <- x[!row.has.na,]
dim(x)

colnames(x) <- gsub('\\.','-',substr(colnames(rna),1,28))
head(x)

library(EBSeq)
str(x)

#read files of tumor stage
library(xlsx)

Tumor_stage <- read.xlsx(file = "Clinical_merged_picked/Tumor_stage_data.xlsx", sheetIndex = 1)
colnames(Tumor_stage)
Tumor_stage$Stage_I_Barcode <- toupper(Tumor_stage$Stage_I_Barcode)
Tumor_stage$Stage_IV_Barcode <- toupper(Tumor_stage$Stage_IV_Barcode)

#select only tumor samples
Tumor_Samples <- subset(x, select = substr(colnames(x),14,14) == '0')
dim(Tumor_Samples)

#subset the stages (I and IV) of tumor samples from Tumor_samples
Stage_I <- subset(Tumor_Samples, select = substr(colnames(Tumor_Samples),1,12) %in% Tumor_stage$Stage_I_Barcode)
dim(Stage_I)

Stage_IV <- subset(Tumor_Samples, select = substr(colnames(Tumor_Samples),1,12) %in% Tumor_stage$Stage_IV_Barcode)
dim(Stage_IV)

#remove duplicate colnames
Stage_I <- Stage_I[, !duplicated(colnames(Stage_I))]
Stage_IV <- Stage_IV[, !duplicated(colnames(Stage_IV))]

stage_d126samples <- cbind(Stage_I[,1:100], Stage_IV[,1:26])
dim(stage_d126samples)
#library size factor
#1. by median normalization approach,
size_d126 = MedianNorm(stage_d126samples)
## running EBSeq on gene expression estimates
## the function EBTest is used to detect the DE genes
EBOut_d126 = EBTest(Data = stage_d126samples, Conditions = as.factor(c(rep("stage_i",100),rep("stage_iv",26))), sizeFactors = size_d126, maxround = 5)
EBDE_d126 = GetDEResults(EBOut_d126, FDR = 0.05)
EBDE_d126$DEfound

GeneFC_d126 = PostFC(EBOut_d126)
PlotPostVsRawFC(EBOut_d126,GeneFC_d126)
par(mfrow = c(1,2))
QQP(EBOut_d126)
par(mfrow = c(1,2))
DenNHist(EBOut_d126)

EBOut_d$Alpha
EBOut_d$Beta

#subsetting DEG from gene expression data
idx_126 <- EBDE_d126$DEfound
eset <- EBOut_d126$DataNorm[idx_126,]
dim(eset)
head(eset)

#row clustering
hr <- hclust(as.dist(1-cor(t(eset), method="pearson")),method="complete")
par(mfrow = c(1,1))
plot(as.dendrogram(hr))

colnames(eset) <- c(rep("stage_i",100),rep("stage_iv",26))
hc <-  hclust(as.dist(1-cor(eset, method="spearman")), method="complete")
plot(as.dendrogram(hc))


library(gplots)
heatmap.2(eset,main="Hierarchical Cluster",Rowv=as.dendrogram(hr), Colv=NA, dendrogram="row", scale="row", col=colorpanel(75,"green","black","red"), density.info="none", trace="none")

here are the images of the plots generated, DEG heatmap Posterior FC plot QQ plot density plot

As in my heatmap, I can not see any differentiating pattern

1) Please explain me if I'm doing anything or using any command in wrong.

I know i have used normalized counts as a input to EBSeq, but i'm not so sure if i have done the right thing.

I have referred to one question answers (FAQ) link about EBSeq, where i have found this, In general, it is not appropriate to perform cross sample comparisons using TPM, FPKM or RPKM without further normalization. Instead, you may use normalized counts (It can be generated by GetNormalizedMat() function from raw count, note EBSeq testing functions takes raw counts and library size factors)

2) So according to above statement, i can give normalized counts to EBSeq ? if my understanding is not correct then please tell me. It will surely clear my doubts.

Your suggestions will make me clear my problem. I look forward to hear from you,

Thanks, Ankita

ebseq rsem rna-seq tcga R • 564 views
ADD COMMENTlink written 17 months ago by lawarde.ankita130
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1506 users visited in the last hour