Question

single tumor vs multiple normal sample differential gene expression (RNA-Seq ) analysis using DESeq2

1

Entering edit mode

3.5 years ago

sumitguptabt ▴ 30

Hi, I am new in bioinformatics and have a tricky problem to solve. I downloaded multiple RNA-seq datasets from TCGA GDC. My goal is to find upregulated and downregulated genes in each tumor sample after comparing it with all normals samples, e.g., "single tumor" vs. "all normal sample count" or "single tumor" vs. "single normal sample count"). This is needed to develop a model for finding new targets in cancer treatment. I looked for DESeq2 and followed a few tutorials in the last 3 days, but none worked with the single tumor sample. Can anyone guide me, how to do it? I really appreciate any help you can provide. Thank you Sumit

RNA-Seq DESeq2 • 2.6k views

ADD COMMENT • link updated 3.5 years ago by Ram 44k • written 3.5 years ago by sumitguptabt ▴ 30

0

Entering edit mode

Kevin Blighe and Devon Ryan dear sir can you help me in this regard? Thanks in advance.

ADD REPLY • link 3.5 years ago by sumitguptabt ▴ 30

2

Entering edit mode

It would be really beautiful if you can write an r script for the same.

Please do not request that people do your work for you. Ask for a lead or clarification on a concept and write the code yourself.

ADD REPLY • link 3.5 years ago by Ram 44k

0

Entering edit mode

Dear Ram Thank you so much for your comment. I modified my comment as you suggested. I can understand your concern but it is very difficult for me to understand everything correctly in r. Anyway, I am trying to do it by myself but if you can provide any kind of assistance, it would be really nice. Thank you Sumit

ADD REPLY • link 3.5 years ago by sumitguptabt ▴ 30

1

Entering edit mode

All of us start like that, but through practice, we get to a place of relative ease of use with any new technology. Try by yourself, and ask us if you have specific questions.

ADD REPLY • link 3.5 years ago by Ram 44k

0

Entering edit mode

Thank you for your comment. My question is How to do the differential gene expression analysis of "single tumor RNA seq count " vs. "multiple normal RNA seq count" or "single tumor RNA seq count" vs. "single normal RNA seq count" using either DEseq2 or EdgeR. BW Sumit

ADD REPLY • link 3.5 years ago by sumitguptabt ▴ 30

1

Entering edit mode

What have you tried? Have you read the DESeq2 vignette? What is the code you have so far?

ADD REPLY • link 3.5 years ago by Ram 44k

0

Entering edit mode

I tried EdgeR and DEseq2 however DEseq2 was showing an error during the creation of the DEseq2DataSet object from the matrix, It was only when I use single tumor sample. The EdgeR was also showing an error during "DGEList" and asking for minimum 3 data samples for each group so I just duplicated the single column into 3 and run it. I do not know if it is the right way to do but I am getting differentially expressed genes.

Data structure

The Edge code was

library(edgeR)
mobData <- rawCountTable <- read.delim("1.txt", row.names=1)
head(mobData)
tail(mobData)
mobDataGroups <- c("Normal", "Normal", "Normal", "Tumor", "Tumor", "Tumor")
d <- DGEList(counts=mobData,group=factor(mobDataGroups))
dim(d)
head(d$counts)
head(cpm(d))
apply(d$counts, 2, sum)
keep <- rowSums(cpm(d)> 1) >= 2
d <- d[keep,]
dim(d)
d$samples$lib.size <- colSums(d$counts)
d$samples
d <- calcNormFactors(d)
plotMDS(d, method="bcv", col=as.numeric(d$samples$group))
legend("bottomleft", as.character(unique(d$samples$group)), col=1:3, pch=20)
d1 <- estimateCommonDisp(d, verbose=T)
names(d1)
d1 <- estimateTagwiseDisp(d1)
names(d1)
plotBCV(d1)
design.mat <- model.matrix(~ 0 + d$samples$group)
colnames(design.mat) <- levels(d$samples$group)
d2 <- estimateGLMCommonDisp(d,design.mat)
d2 <- estimateGLMTrendedDisp(d2,design.mat, method="power")
d2 <- estimateGLMTagwiseDisp(d2,design.mat)
plotBCV(d2)
et12 <- exactTest(d1, pair=c(1,2)) # compare groups 1 and 2

The output was

Output

I would like to ask whether it is the right approach and if not then please suggest to me the right way to do it. Another thing is et12 pair=c(1,2)) (last code line) output is the upregulated or downregulated genes in tumor or normal tissue? I searched "https://rdrr.io/bioc/edgeR/man/exactTest.html" and found that it is group 2-group1 so I it is for tumor but just want to reconfirm it from you. I heard about DESeq2 vignette but I do not know much about it. I will check it. Thank you so much for your kind attention. Waiting to hear from you. BW Sumit

ADD REPLY • link 3.5 years ago by sumitguptabt ▴ 30

0

Entering edit mode

The EdgeR was also showing an error during "DGEList" and asking for minimum 3 data samples for each group so I just duplicated the single column into 3 and run it. I do not know if it is the right way to do

I don't know edgeR but I don't think this is the right way to go. I'm not entirely sure though, maybe someone else can give you a more certain answer. As for the comparison, I think positive logFC is upregulated in Tumor. I'm guessing this from the manual and the fact that levels(factor(rep(c("Normal","Tumor"), each = 3)))[1] is Normal and levels(factor(rep(c("Normal","Tumor"), each = 3)))[2] is Tumor.

ADD REPLY • link 3.5 years ago by Ram 44k