Question: Strange distribution of logFC
0
gravatar for Claire
10 days ago by
Claire40
Germany
Claire40 wrote:

Dear all,

I have RNAseq data of two groups. After doing differential gene expression using edgeR and DESeq2(the results are similar). I found the distribution of logFC is a little strange, having two peaks in the distribution.

My guess would be that most genes are equally enriched in both goups, i.e. the mode of the logFC distribution (probably Gaussian-like) is around 0.

Perhaps someone can tell you why this is so.

distribution of logFC volcanno plot

Codes can be seen bolow:

P1<-read.table("P1.STAR-counts-stranded.txt",header = TRUE)
row.names(P1) <- P1[,1]
P2<-read.table("P2.STAR-counts-stranded.txt",header = TRUE)
row.names(P2) <- P2[,1]
H3<-read.table("H3.STAR-counts-stranded.txt",header = TRUE)
row.names(H3) <- H3[,1]
H4<-read.table("H4.STAR-counts-stranded.txt",header = TRUE)
row.names(H4) <- H4[,1]
LIST<-list(P1,P2,H3,H4)
COL<-unique(unlist(lapply(LIST, colnames)))
ROW<-unique(unlist(lapply(LIST, rownames)))
TOTAL<-matrix(data=NA,nrow=length(ROW), ncol = length(COL),dimnames = list(ROW,COL))
for (ls in LIST){
  TOTAL[rownames(ls),colnames(ls)]<-as.matrix(ls)
}
write.csv(TOTAL,file="/home/Claire/Desktop/data/STAR_6/TOTAL-PH-STAR-counts-stranded.csv")

edgeR

setwd("/home/Claire/Desktop/data/STAR_6/")
library(edgeR)
HPcount<-read.csv("TOTAL-PH-STAR-counts-stranded.csv",header = TRUE,sep = ",")
HPcounts<-HPcount[,8:11]
samplenames<-c("P1","P2","H3","H4")
HPgeneid<-HPcount[,2]
rownames(HPcounts)<-HPgeneid
colnames(HPcounts) <- samplenames
HPgenes<-HPcount[,2:7]
group <- as.factor(c("P", "P", "H","H"))
DGEList <- DGEList(counts=HPcounts, group=group,genes=HPgenes)
keep <- rowSums(cpm(DGEList)>0.5) >= 2
DGEList_keep <- DGEList[keep , keep.lib.sizes=FALSE]
DGEList_keep_norm <- calcNormFactors(DGEList_keep)

design matrix

design<-model.matrix(~0+group)
colnames(design)<-levels(group)
PvsH<-makeContrasts(P-H, levels = design)

Estimating the dispersion

y<-estimateDisp(DGEList_keep_norm,design)
fit<-glmFit(y,design)

to perform likelihood ration tests

lrt<-glmLRT(fit,contrast = PvsH)

hist and volcanno plot

volcanoData <- cbind(lrt$table$logFC, -log10(lrt$table$PValue))
colnames(volcanoData) <- c("logFC", "negLogPval")
plot(volcanoData, pch=19)
hist(lrt$table$logFC,breaks=100)
ADD COMMENTlink modified 10 days ago by Devon Ryan73k • written 10 days ago by Claire40
1

I'm also missing the problem here -- it's quite normal to see changes in both directions, i.e. up- as well as down-regulated genes. Is there a reason why you expect only changes into one direction?

ADD REPLYlink written 9 days ago by Friederike840

How do histograms of your raw, normalised, and logged-normalised counts look? I expect to see a bimodal distribution there, too

ADD REPLYlink written 10 days ago by Kevin Blighe7.3k

What do you expect from your experiment? Is it something that is likely to change expression on thousands of genes, or just a few tens or hundreds? Do yu expect large fold-changes, or small?

ADD REPLYlink written 10 days ago by h.mon9.2k

Cross-posting is discouraged as you are asking for duplicate effort from the volunteers who answer:

https://support.bioconductor.org/p/102905/

ADD REPLYlink written 10 days ago by Michael Love1.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 977 users visited in the last hour