Question

Strange distribution of logFC

0

Entering edit mode

6.5 years ago

SMILE ▴ 170

Dear all,

I have RNAseq data of two groups. After doing differential gene expression using edgeR and DESeq2(the results are similar). I found the distribution of logFC is a little strange, having two peaks in the distribution.

My guess would be that most genes are equally enriched in both goups, i.e. the mode of the logFC distribution (probably Gaussian-like) is around 0.

Perhaps someone can tell you why this is so.

distribution of logFC volcanno plot

Codes can be seen bolow:

P1<-read.table("P1.STAR-counts-stranded.txt",header = TRUE)
row.names(P1) <- P1[,1]
P2<-read.table("P2.STAR-counts-stranded.txt",header = TRUE)
row.names(P2) <- P2[,1]
H3<-read.table("H3.STAR-counts-stranded.txt",header = TRUE)
row.names(H3) <- H3[,1]
H4<-read.table("H4.STAR-counts-stranded.txt",header = TRUE)
row.names(H4) <- H4[,1]
LIST<-list(P1,P2,H3,H4)
COL<-unique(unlist(lapply(LIST, colnames)))
ROW<-unique(unlist(lapply(LIST, rownames)))
TOTAL<-matrix(data=NA,nrow=length(ROW), ncol = length(COL),dimnames = list(ROW,COL))
for (ls in LIST){
  TOTAL[rownames(ls),colnames(ls)]<-as.matrix(ls)
}
write.csv(TOTAL,file="/home/Claire/Desktop/data/STAR_6/TOTAL-PH-STAR-counts-stranded.csv")

edgeR

setwd("/home/Claire/Desktop/data/STAR_6/")
library(edgeR)
HPcount<-read.csv("TOTAL-PH-STAR-counts-stranded.csv",header = TRUE,sep = ",")
HPcounts<-HPcount[,8:11]
samplenames<-c("P1","P2","H3","H4")
HPgeneid<-HPcount[,2]
rownames(HPcounts)<-HPgeneid
colnames(HPcounts) <- samplenames
HPgenes<-HPcount[,2:7]
group <- as.factor(c("P", "P", "H","H"))
DGEList <- DGEList(counts=HPcounts, group=group,genes=HPgenes)
keep <- rowSums(cpm(DGEList)>0.5) >= 2
DGEList_keep <- DGEList[keep , keep.lib.sizes=FALSE]
DGEList_keep_norm <- calcNormFactors(DGEList_keep)

design matrix

design<-model.matrix(~0+group)
colnames(design)<-levels(group)
PvsH<-makeContrasts(P-H, levels = design)

Estimating the dispersion

y<-estimateDisp(DGEList_keep_norm,design)
fit<-glmFit(y,design)

to perform likelihood ration tests

lrt<-glmLRT(fit,contrast = PvsH)

hist and volcanno plot

volcanoData <- cbind(lrt$table$logFC, -log10(lrt$table$PValue))
colnames(volcanoData) <- c("logFC", "negLogPval")
plot(volcanoData, pch=19)
hist(lrt$table$logFC,breaks=100)

rna-seq differential expressed genes edgeR • 2.7k views

ADD COMMENT • link updated 6.5 years ago by Devon Ryan 104k • written 6.5 years ago by SMILE ▴ 170

1

Entering edit mode

I'm also missing the problem here -- it's quite normal to see changes in both directions, i.e. up- as well as down-regulated genes. Is there a reason why you expect only changes into one direction?

ADD REPLY • link 6.5 years ago by Friederike 8.9k

0

Entering edit mode

How do histograms of your raw, normalised, and logged-normalised counts look? I expect to see a bimodal distribution there, too

ADD REPLY • link 6.5 years ago by Kevin Blighe 87k

0

Entering edit mode

What do you expect from your experiment? Is it something that is likely to change expression on thousands of genes, or just a few tens or hundreds? Do yu expect large fold-changes, or small?

ADD REPLY • link 6.5 years ago by h.mon 35k

0

Entering edit mode

Cross-posting is discouraged as you are asking for duplicate effort from the volunteers who answer:

https://support.bioconductor.org/p/102905/

ADD REPLY • link 6.5 years ago by Michael Love ★ 2.6k