Question

EdgeR, a few DEGs found between group comparison, Can I go to further downstream analysis like heat map, GSEA, GO-BP and KEGG with these few DEGs?

0

Entering edit mode

2.1 years ago

TM ▴ 20

Hi, I am doing gene expression profiling of 3 structures(A,B,C). I compared (A vs B&C), (B vs A&C), (C vs A&B). I found 1, 2 and 20 DEG respectively between groups comparison. I already removed non-coding RNA before DEGs analysis as suggested by my senior; however, I still got the same result. Can I go to further downstream analysis like heat map, GO-BP and KEGG with these few DEGs? Please give me suggestion what should I do next? Thank you in advance!!

EdgeR DEGs • 1.1k views

ADD COMMENT • link 2.1 years ago by TM ▴ 20

0

Entering edit mode

What does A vs B&C mean? Please show exactly what contrast you tested.

ADD REPLY • link 2.1 years ago by Gordon Smyth ★ 7.0k

1

Entering edit mode

I have 9 samples from 3 donors. Each donor has 3 samples, representing 3 structures. For example Donor 1 has structure A,B,C Donor 2 has structure A,B,C Donor 3 has structure A,B,C

I would like to see the differentially expressed gene of structure B (taking structure A &C as reference)

I compared structure A with structure B & structure C where I marked structure B & C as "others".

setwd("~/Desktop/RNAseq")
getwd()
counts <- read.table('NewCounting_file.txt')
head(counts)
str(counts)

#create gene name and gene id variable
gene_name <- counts[,2]
gene_id <- counts[,1]
class(counts)
head(gene_id)
head(gene_name)

#Delete gene id and gene name column on counts 
counts[1:2] <- list(NULL)
head(counts)

#create gene count numeric matrix 
counts_mat <- apply(as.matrix.noquote(counts),2,as.numeric)
head(counts_mat)
class(counts_mat)

#Assign gene name as row name 
row.names(counts_mat) <- gene_name
head(counts_mat)

sample_name <- c("A","B","C","A","B","C","A","B","C")
colnames(counts_mat) <- sample_name
groupsA <- c("A","others","others","A","others","others","A", "others","others")
geneexpA <- DGEList(counts=counts_mat, group=groupsA)
library(edgeR)

if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("rtracklayer") install.packages("rtracklayer")

gtf <- rtracklayer::import("gencode.v38.annotation.gtf")
table(gtf$gene_type)
lncrna <- unique(gtf[gtf$gene_type=="lncRNA"]$gene_name)
geneexpA[rownames(geneexpA) %in% lncrna,]
filterByExpr(geneexpA,remove.lncrna=TRUE) 
geneexpA <- calcNormFactors(geneexpA)
geneexpA <- estimateCommonDisp(geneexpA)
geneexpA <- estimateTagwiseDisp(geneexpA)
head(geneexpA)
genediffA <- exactTest(geneexpA)
genediffA 
topTags(genediffA)
genediffA$table <- cbind(genediffA$table, FDR=p.adjust(genediffA$table$PValue, method ='fdr'))
head(genediffA)
str(genediffA)
summary(genediffA)
gsignA <- genediffA$table[genediffA$table$FDR<0.03,]
gsignA <- gsignA[order(gsignA$FDR),]
dim(gsignA)
head(gsignA)

gsignA_df <- data.frame(gsignA)
row.names(gsignA_df) <- gsignA_df$ProbeID
gsignA_df <- select(gsignA_df,-ProbeID)
heatmapA(as.metrix(gsignA_df))

ADD REPLY • link 2.1 years ago by TM ▴ 20

0

Entering edit mode

This endeavour has several threads now, see here, @OP, why don't you add here full code and description of your experiment, including groups and n per group, so people can have a look. It is not productive to split something like this over a dozen posts, information only gets diluted.

ADD REPLY • link 2.1 years ago by ATpoint 82k

1

Entering edit mode

I have edited my whole script in the reply as your suggestion. Please let me know if I wrote anything wrong!

ADD REPLY • link 2.1 years ago by TM ▴ 20

0

Entering edit mode

Really sorry, I have tried to do step by step. I have no background in DEGs before and this is my 1st time doing this project. I knew it is a bit messed up. Thank you for your feedback!

ADD REPLY • link 2.1 years ago by TM ▴ 20

score 2 · Answer 1 · 2022-03-21

2

Entering edit mode

2.1 years ago

Gordon Smyth ★ 7.0k

The reason you have so few DE genes is that you are not using edgeR correctly. First, it isn't correct to pretend that B & C are the same group, not even temporarily. You should only define one group factor. Second, you have a paired experiment but you haven't adjusted for baseline differences between the donors. You should be forming:

Group <-  c("A","B","C","A","B","C","A","B","C")
design <- model.matrix(~0+Group+Donor)

Then if you want to compare A to B&C, you form the contrast A - (B+C)/2. The correct approach is easier than what you're doing and statistically far more powerful.

There are several examples of paired analyses in the edgeR and limma User's Guides.

Is there any reason why you are using FDR < 0.03? That is an unusually small FDR cutoff to use. Obviously using a larger cutoff will give more DE genes.

Your data processing could also be simplified, for example:

DataTable <- read.table('NewCounting_file.txt', row.names=2)
y <- DGEList(counts=DataTable[,-1], genes=DataTable[,1,drop=FALSE], group=Group)

ADD COMMENT • link 2.1 years ago by Gordon Smyth ★ 7.0k

0

Entering edit mode

I adjust this script from my senior, I used to try with FDR<0.05 too but it didn't make any differences. Would you mind help me elaborate more with the code I need to change? My understanding on coding is very limited. Thank you so much for your reply, I am stuck with this more than a week already!!

ADD REPLY • link 2.1 years ago by TM ▴ 20

0

Entering edit mode

You've just edited code from your senior without reading any edgeR documentation?