single tumor vs multiple normal sample differential gene expression (RNA-Seq ) analysis using DESeq2
1
1
Entering edit mode
4 weeks ago
sumitguptabt ▴ 20

Hi, I am new in bioinformatics and have a tricky problem to solve. I downloaded multiple RNA-seq datasets from TCGA GDC. My goal is to find upregulated and downregulated genes in each tumor sample after comparing it with all normals samples, e.g., "single tumor" vs. "all normal sample count" or "single tumor" vs. "single normal sample count"). This is needed to develop a model for finding new targets in cancer treatment. I looked for DESeq2 and followed a few tutorials in the last 3 days, but none worked with the single tumor sample. Can anyone guide me, how to do it? I really appreciate any help you can provide. Thank you Sumit

RNA-Seq DESeq2 • 866 views
ADD COMMENT
0
Entering edit mode

Kevin Blighe and Devon Ryan dear sir can you help me in this regard? Thanks in advance.

ADD REPLY
1
Entering edit mode

It would be really beautiful if you can write an r script for the same.

Please do not request that people do your work for you. Ask for a lead or clarification on a concept and write the code yourself.

ADD REPLY
0
Entering edit mode

Dear Ram Thank you so much for your comment. I modified my comment as you suggested. I can understand your concern but it is very difficult for me to understand everything correctly in r. Anyway, I am trying to do it by myself but if you can provide any kind of assistance, it would be really nice. Thank you Sumit

ADD REPLY
1
Entering edit mode

All of us start like that, but through practice, we get to a place of relative ease of use with any new technology. Try by yourself, and ask us if you have specific questions.

ADD REPLY
0
Entering edit mode

Thank you for your comment. My question is How to do the differential gene expression analysis of "single tumor RNA seq count " vs. "multiple normal RNA seq count" or "single tumor RNA seq count" vs. "single normal RNA seq count" using either DEseq2 or EdgeR. BW Sumit

ADD REPLY
1
Entering edit mode

What have you tried? Have you read the DESeq2 vignette? What is the code you have so far?

ADD REPLY
0
Entering edit mode

I tried EdgeR and DEseq2 however DEseq2 was showing an error during the creation of the DEseq2DataSet object from the matrix, It was only when I use single tumor sample. The EdgeR was also showing an error during "DGEList" and asking for minimum 3 data samples for each group so I just duplicated the single column into 3 and run it. I do not know if it is the right way to do but I am getting differentially expressed genes.

Data structure

The Edge code was

library(edgeR)
mobData <- rawCountTable <- read.delim("1.txt", row.names=1)
head(mobData)
tail(mobData)
mobDataGroups <- c("Normal", "Normal", "Normal", "Tumor", "Tumor", "Tumor")
d <- DGEList(counts=mobData,group=factor(mobDataGroups))
dim(d)
head(d$counts)
head(cpm(d))
apply(d$counts, 2, sum)
keep <- rowSums(cpm(d)> 1) >= 2
d <- d[keep,]
dim(d)
d$samples$lib.size <- colSums(d$counts)
d$samples
d <- calcNormFactors(d)
plotMDS(d, method="bcv", col=as.numeric(d$samples$group))
legend("bottomleft", as.character(unique(d$samples$group)), col=1:3, pch=20)
d1 <- estimateCommonDisp(d, verbose=T)
names(d1)
d1 <- estimateTagwiseDisp(d1)
names(d1)
plotBCV(d1)
design.mat <- model.matrix(~ 0 + d$samples$group)
colnames(design.mat) <- levels(d$samples$group)
d2 <- estimateGLMCommonDisp(d,design.mat)
d2 <- estimateGLMTrendedDisp(d2,design.mat, method="power")
d2 <- estimateGLMTagwiseDisp(d2,design.mat)
plotBCV(d2)
et12 <- exactTest(d1, pair=c(1,2)) # compare groups 1 and 2

The output was

Output

I would like to ask whether it is the right approach and if not then please suggest to me the right way to do it. Another thing is et12 pair=c(1,2)) (last code line) output is the upregulated or downregulated genes in tumor or normal tissue? I searched "https://rdrr.io/bioc/edgeR/man/exactTest.html" and found that it is group 2-group1 so I it is for tumor but just want to reconfirm it from you. I heard about DESeq2 vignette but I do not know much about it. I will check it. Thank you so much for your kind attention. Waiting to hear from you. BW Sumit

ADD REPLY
0
Entering edit mode

The EdgeR was also showing an error during "DGEList" and asking for minimum 3 data samples for each group so I just duplicated the single column into 3 and run it. I do not know if it is the right way to do

I don't know edgeR but I don't think this is the right way to go. I'm not entirely sure though, maybe someone else can give you a more certain answer. As for the comparison, I think positive logFC is upregulated in Tumor. I'm guessing this from the manual and the fact that levels(factor(rep(c("Normal","Tumor"), each = 3)))[1] is Normal and levels(factor(rep(c("Normal","Tumor"), each = 3)))[2] is Tumor.

ADD REPLY
3
Entering edit mode
4 weeks ago
ATpoint 49k

Please go through the answers listed from the edgeR senior author here: edgeR - DE with one disease sample vs. multiple control samples?

And please do not copy/paste samples to create more samples, this is plain wrong.

It would also help (in the future) to provide the code you use if you complain about an analysis not working. You can perfectly create a DGEList and run a DE analysis with a single sample in one group, at bare minimum 1 vs 2, e.g.:

library(edgeR)

ngenes <- 1000
nsamples <- 3
Counts <- matrix(rnbinom(ngenes*nsamples,mu=5,size=2),ngenes,nsamples)
rownames(Counts) <- 1:ngenes
y <- DGEList(counts=Counts, group=c(rep("A", 2), "B"))

y$samples ## here is your 2 vs 1

y<-calcNormFactors(y)
y$design <- model.matrix(~group,y$samples)
y<-estimateDisp(y, design=y$design)
y$fit<-glmQLFit(y, design=y$design)
topTags(glmQLFTest(y$fit,coef=2),n=Inf)
ADD COMMENT
0
Entering edit mode

Dear ATpoint Thank you so much for your help, I can not describe my happiness. I run your script and got the output as you see in the attached picture. The code I entered:

library(edgeR)
setwd("/home/sumit/Test/10")
rawdata <- read.delim("TableS1.txt", check.names=FALSE, stringsAsFactors=FALSE)
ngenes <- 15668 #Total 15668 genes so I replaced 1000 with 15668
nsamples <- 6
Counts <- matrix(rnbinom(ngenes*nsamples,mu=5,size=2),ngenes,samples) 
rownames(Counts) <- 1:ngenes
y <- DGEList(counts=Counts, group=c(rep("Normal", 5), "Tumor"))
y$samples ## here is your 2 vs 1
y<-calcNormFactors(y)
y$design <- model.matrix(~group,y$samples)
y<-estimateDisp(y, design=y$design)
y$fit<-glmQLFit(y, design=y$design)
topTags(glmQLFTest(y$fit,coef=2),n=Inf)

May I ask if it is the correct way or it needs some modification? I wonder how the 6th line "Counts" know the count data column?

I got the LogFC value after running the last line of the script (I believe it is for the tumor sample as you already said), however, the first column of output was filled with some random values as you see in the picture below. If you do not mind then may I ask how to get gene ID in the first line?

Sorry for my bad English and also I apologize for any inconvenience due to uploading pictures. This is for only your kind reference purpose and I promise that I will delete it and format the comment as you said earlier.

Thank you in advance

Sumit

Console output

ADD REPLY
0
Entering edit mode

Dear, please help me, I am new in this field. Please tell me if it is the right way or needs some correction.

ADD REPLY
0
Entering edit mode

I tried another code and also using manual but I do not know where I can fit the line suggested by you

Counts <- matrix(rnbinom(ngenes*nsamples,mu=5,size=2),ngenes,samples)

The code I used and It seems working

library(edgeR)
setwd("/home/sumit/Test/11forTCC")
raw.data<-read.table(file="TableS1.txt",header=T)
head(raw.data)
d<-raw.data[,2:7]
rownames(d)<-raw.data[,1]
group<-factor(c("Normal","Normal","Normal","Normal","Normal","Tumor"))
design<-model.matrix(~group)
d<-DGEList(counts=d,group=group)
dim(d)
d <- calcNormFactors(d)
keep <- rowSums(cpm(d)>100) >= 2
d <- d[keep,]
dim(d)
d<-estimateGLMCommonDisp(d,design,method="deviance",robust="TRUE",subset=NULL)
d$samples$lib.size <- colSums(d$counts)
d
bcv <- 0.2
et <- exactTest(d, dispersion=bcv^2)
top<-topTags(et)
top
summary(de <- decideTestsDGE(et))
plotMDS(d)

The output of the "top" was "Comparison of groups: Tumor-Normal" Can you check if it is correct? Please Thank you Sumit

ADD REPLY
1
Entering edit mode

When users are online and feel like responding they will, me personally I do not comment at midnight on a Sunday so adding comments asking for help will not speed things up. We are driven by volunteers after all, keep that in mind ;-)

As for your code, you obviously should load your counts, not these dummy data I made up with the nbinom function for the sake of providing any data to run the example code. A general advise would be to follow the edgeR manual. It contains all necessary information. Do not make up a bcv, estimate it using the replicated normals that you have. If you just give it a value then the analysis is kind of pointless. There is no data driven evidence this value is robust. WIth a single sample versus the normals you accept that the dispersion is estimated from the normals, so if the tumor is very heterogeneous you will get more DEGs than there actually are, and if the tumor is more homogeneous than the normals then you get fewer DEGs, that is exactly why single-sample analysis is suboptimal. Is this really necessary for your project, why not taking all tumors as group? You do not have to answer this to me, answer it to yourself and discuss with your PI. For code advise please read the manual https://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf It has a quick start section that will probably do ok here for a simple groupwise comparison. You might want to try and find local guidance if you want to base downstream experiments on the results of this comparison, an online community is not the right please to "check" your code in terms of you having confidence in it, after all this here comes without any warranty and we do not know what your project is all about. Your seem to be new to the field, which is ok as we all were at some point, but there are mistakes to be made (which is also normal) so if this is the start of a project (maybe even a long one) then try to get experienced help at your institute.

ADD REPLY
0
Entering edit mode

Hi,

Like ATPoint said, it was a Sunday evening and all of us have personal and professional lives outside of here, so we cannot cater to questions constantly/immediately.

In any case, I am not familiar with edgeR and reading the manual or searching the Bioconductor support forum would be more useful.

Also, please stop using "dear" - "Dear sir/madam" etc is outdated, and using "dear" alone is odd to say the least and in the worst case, it sounds creepy when you're addressing strangers.

ADD REPLY

Login before adding your answer.

Traffic: 2419 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6