single tumor vs multiple normal sample differential gene expression (RNA-Seq ) analysis using DESeq2
1
1
Entering edit mode
4 weeks ago
sumitguptabt ▴ 20

Hi, I am new in bioinformatics and have a tricky problem to solve. I downloaded multiple RNA-seq datasets from TCGA GDC. My goal is to find upregulated and downregulated genes in each tumor sample after comparing it with all normals samples, e.g., "single tumor" vs. "all normal sample count" or "single tumor" vs. "single normal sample count"). This is needed to develop a model for finding new targets in cancer treatment. I looked for DESeq2 and followed a few tutorials in the last 3 days, but none worked with the single tumor sample. Can anyone guide me, how to do it? I really appreciate any help you can provide. Thank you Sumit

RNA-Seq DESeq2 • 866 views
0
Entering edit mode

Kevin Blighe and Devon Ryan dear sir can you help me in this regard? Thanks in advance.

1
Entering edit mode

It would be really beautiful if you can write an r script for the same.

Please do not request that people do your work for you. Ask for a lead or clarification on a concept and write the code yourself.

0
Entering edit mode

Dear Ram Thank you so much for your comment. I modified my comment as you suggested. I can understand your concern but it is very difficult for me to understand everything correctly in r. Anyway, I am trying to do it by myself but if you can provide any kind of assistance, it would be really nice. Thank you Sumit

1
Entering edit mode

All of us start like that, but through practice, we get to a place of relative ease of use with any new technology. Try by yourself, and ask us if you have specific questions.

0
Entering edit mode

Thank you for your comment. My question is How to do the differential gene expression analysis of "single tumor RNA seq count " vs. "multiple normal RNA seq count" or "single tumor RNA seq count" vs. "single normal RNA seq count" using either DEseq2 or EdgeR. BW Sumit

1
Entering edit mode

What have you tried? Have you read the DESeq2 vignette? What is the code you have so far?

0
Entering edit mode

I tried EdgeR and DEseq2 however DEseq2 was showing an error during the creation of the DEseq2DataSet object from the matrix, It was only when I use single tumor sample. The EdgeR was also showing an error during "DGEList" and asking for minimum 3 data samples for each group so I just duplicated the single column into 3 and run it. I do not know if it is the right way to do but I am getting differentially expressed genes.

The Edge code was

library(edgeR)
mobData <- rawCountTable <- read.delim("1.txt", row.names=1)
tail(mobData)
mobDataGroups <- c("Normal", "Normal", "Normal", "Tumor", "Tumor", "Tumor")
d <- DGEList(counts=mobData,group=factor(mobDataGroups))
dim(d)
head(d$counts) head(cpm(d)) apply(d$counts, 2, sum)
keep <- rowSums(cpm(d)> 1) >= 2
d <- d[keep,]
dim(d)
d$samples$lib.size <- colSums(d$counts) d$samples
d <- calcNormFactors(d)
plotMDS(d, method="bcv", col=as.numeric(d$samples$group))
legend("bottomleft", as.character(unique(d$samples$group)), col=1:3, pch=20)
d1 <- estimateCommonDisp(d, verbose=T)
names(d1)
d1 <- estimateTagwiseDisp(d1)
names(d1)
plotBCV(d1)
design.mat <- model.matrix(~ 0 + d$samples$group)
colnames(design.mat) <- levels(d$samples$group)
d2 <- estimateGLMCommonDisp(d,design.mat)
d2 <- estimateGLMTrendedDisp(d2,design.mat, method="power")
d2 <- estimateGLMTagwiseDisp(d2,design.mat)
plotBCV(d2)
et12 <- exactTest(d1, pair=c(1,2)) # compare groups 1 and 2


The output was

I would like to ask whether it is the right approach and if not then please suggest to me the right way to do it. Another thing is et12 pair=c(1,2)) (last code line) output is the upregulated or downregulated genes in tumor or normal tissue? I searched "https://rdrr.io/bioc/edgeR/man/exactTest.html" and found that it is group 2-group1 so I it is for tumor but just want to reconfirm it from you. I heard about DESeq2 vignette but I do not know much about it. I will check it. Thank you so much for your kind attention. Waiting to hear from you. BW Sumit

0
Entering edit mode

The EdgeR was also showing an error during "DGEList" and asking for minimum 3 data samples for each group so I just duplicated the single column into 3 and run it. I do not know if it is the right way to do

I don't know edgeR but I don't think this is the right way to go. I'm not entirely sure though, maybe someone else can give you a more certain answer. As for the comparison, I think positive logFC is upregulated in Tumor. I'm guessing this from the manual and the fact that levels(factor(rep(c("Normal","Tumor"), each = 3)))[1] is Normal and levels(factor(rep(c("Normal","Tumor"), each = 3)))[2] is Tumor.

3
Entering edit mode
4 weeks ago
ATpoint 49k

Please go through the answers listed from the edgeR senior author here: edgeR - DE with one disease sample vs. multiple control samples?

And please do not copy/paste samples to create more samples, this is plain wrong.

It would also help (in the future) to provide the code you use if you complain about an analysis not working. You can perfectly create a DGEList and run a DE analysis with a single sample in one group, at bare minimum 1 vs 2, e.g.:

library(edgeR)

ngenes <- 1000
nsamples <- 3
Counts <- matrix(rnbinom(ngenes*nsamples,mu=5,size=2),ngenes,nsamples)
rownames(Counts) <- 1:ngenes
y <- DGEList(counts=Counts, group=c(rep("A", 2), "B"))

y$samples ## here is your 2 vs 1 y<-calcNormFactors(y) y$design <- model.matrix(~group,y$samples) y<-estimateDisp(y, design=y$design)
y$fit<-glmQLFit(y, design=y$design)
topTags(glmQLFTest(y$fit,coef=2),n=Inf)  ADD COMMENT 0 Entering edit mode Dear ATpoint Thank you so much for your help, I can not describe my happiness. I run your script and got the output as you see in the attached picture. The code I entered: library(edgeR) setwd("/home/sumit/Test/10") rawdata <- read.delim("TableS1.txt", check.names=FALSE, stringsAsFactors=FALSE) ngenes <- 15668 #Total 15668 genes so I replaced 1000 with 15668 nsamples <- 6 Counts <- matrix(rnbinom(ngenes*nsamples,mu=5,size=2),ngenes,samples) rownames(Counts) <- 1:ngenes y <- DGEList(counts=Counts, group=c(rep("Normal", 5), "Tumor")) y$samples ## here is your 2 vs 1
y<-calcNormFactors(y)
y$design <- model.matrix(~group,y$samples)
y<-estimateDisp(y, design=y$design) y$fit<-glmQLFit(y, design=y$design) topTags(glmQLFTest(y$fit,coef=2),n=Inf)


May I ask if it is the correct way or it needs some modification? I wonder how the 6th line "Counts" know the count data column?

I got the LogFC value after running the last line of the script (I believe it is for the tumor sample as you already said), however, the first column of output was filled with some random values as you see in the picture below. If you do not mind then may I ask how to get gene ID in the first line?

Sorry for my bad English and also I apologize for any inconvenience due to uploading pictures. This is for only your kind reference purpose and I promise that I will delete it and format the comment as you said earlier.

Thank you in advance

Sumit

0
Entering edit mode

Dear, please help me, I am new in this field. Please tell me if it is the right way or needs some correction.

0
Entering edit mode

I tried another code and also using manual but I do not know where I can fit the line suggested by you

Counts <- matrix(rnbinom(ngenes*nsamples,mu=5,size=2),ngenes,samples)


The code I used and It seems working

library(edgeR)
setwd("/home/sumit/Test/11forTCC")
d<-raw.data[,2:7]
rownames(d)<-raw.data[,1]
group<-factor(c("Normal","Normal","Normal","Normal","Normal","Tumor"))
design<-model.matrix(~group)
d<-DGEList(counts=d,group=group)
dim(d)
d <- calcNormFactors(d)
keep <- rowSums(cpm(d)>100) >= 2
d <- d[keep,]
dim(d)
d<-estimateGLMCommonDisp(d,design,method="deviance",robust="TRUE",subset=NULL)
d$samples$lib.size <- colSums(d\$counts)
d
bcv <- 0.2
et <- exactTest(d, dispersion=bcv^2)
top<-topTags(et)
top
summary(de <- decideTestsDGE(et))
plotMDS(d)


The output of the "top" was "Comparison of groups: Tumor-Normal" Can you check if it is correct? Please Thank you Sumit

1
Entering edit mode

When users are online and feel like responding they will, me personally I do not comment at midnight on a Sunday so adding comments asking for help will not speed things up. We are driven by volunteers after all, keep that in mind ;-)

0
Entering edit mode

Hi,

Like ATPoint said, it was a Sunday evening and all of us have personal and professional lives outside of here, so we cannot cater to questions constantly/immediately.

In any case, I am not familiar with edgeR and reading the manual or searching the Bioconductor support forum would be more useful.

Also, please stop using "dear" - "Dear sir/madam" etc is outdated, and using "dear" alone is odd to say the least and in the worst case, it sounds creepy when you're addressing strangers.