Question: Strange distribution of logFC
0
gravatar for SMILE
2.7 years ago by
SMILE130
SMILE130 wrote:

Dear all,

I have RNAseq data of two groups. After doing differential gene expression using edgeR and DESeq2(the results are similar). I found the distribution of logFC is a little strange, having two peaks in the distribution.

My guess would be that most genes are equally enriched in both goups, i.e. the mode of the logFC distribution (probably Gaussian-like) is around 0.

Perhaps someone can tell you why this is so.

distribution of logFC volcanno plot

Codes can be seen bolow:

P1<-read.table("P1.STAR-counts-stranded.txt",header = TRUE)
row.names(P1) <- P1[,1]
P2<-read.table("P2.STAR-counts-stranded.txt",header = TRUE)
row.names(P2) <- P2[,1]
H3<-read.table("H3.STAR-counts-stranded.txt",header = TRUE)
row.names(H3) <- H3[,1]
H4<-read.table("H4.STAR-counts-stranded.txt",header = TRUE)
row.names(H4) <- H4[,1]
LIST<-list(P1,P2,H3,H4)
COL<-unique(unlist(lapply(LIST, colnames)))
ROW<-unique(unlist(lapply(LIST, rownames)))
TOTAL<-matrix(data=NA,nrow=length(ROW), ncol = length(COL),dimnames = list(ROW,COL))
for (ls in LIST){
  TOTAL[rownames(ls),colnames(ls)]<-as.matrix(ls)
}
write.csv(TOTAL,file="/home/Claire/Desktop/data/STAR_6/TOTAL-PH-STAR-counts-stranded.csv")

edgeR

setwd("/home/Claire/Desktop/data/STAR_6/")
library(edgeR)
HPcount<-read.csv("TOTAL-PH-STAR-counts-stranded.csv",header = TRUE,sep = ",")
HPcounts<-HPcount[,8:11]
samplenames<-c("P1","P2","H3","H4")
HPgeneid<-HPcount[,2]
rownames(HPcounts)<-HPgeneid
colnames(HPcounts) <- samplenames
HPgenes<-HPcount[,2:7]
group <- as.factor(c("P", "P", "H","H"))
DGEList <- DGEList(counts=HPcounts, group=group,genes=HPgenes)
keep <- rowSums(cpm(DGEList)>0.5) >= 2
DGEList_keep <- DGEList[keep , keep.lib.sizes=FALSE]
DGEList_keep_norm <- calcNormFactors(DGEList_keep)

design matrix

design<-model.matrix(~0+group)
colnames(design)<-levels(group)
PvsH<-makeContrasts(P-H, levels = design)

Estimating the dispersion

y<-estimateDisp(DGEList_keep_norm,design)
fit<-glmFit(y,design)

to perform likelihood ration tests

lrt<-glmLRT(fit,contrast = PvsH)

hist and volcanno plot

volcanoData <- cbind(lrt$table$logFC, -log10(lrt$table$PValue))
colnames(volcanoData) <- c("logFC", "negLogPval")
plot(volcanoData, pch=19)
hist(lrt$table$logFC,breaks=100)
ADD COMMENTlink modified 2.7 years ago by Devon Ryan96k • written 2.7 years ago by SMILE130
1

I'm also missing the problem here -- it's quite normal to see changes in both directions, i.e. up- as well as down-regulated genes. Is there a reason why you expect only changes into one direction?

ADD REPLYlink written 2.7 years ago by Friederike5.9k

How do histograms of your raw, normalised, and logged-normalised counts look? I expect to see a bimodal distribution there, too

ADD REPLYlink written 2.7 years ago by Kevin Blighe63k

What do you expect from your experiment? Is it something that is likely to change expression on thousands of genes, or just a few tens or hundreds? Do yu expect large fold-changes, or small?

ADD REPLYlink written 2.7 years ago by h.mon30k

Cross-posting is discouraged as you are asking for duplicate effort from the volunteers who answer:

https://support.bioconductor.org/p/102905/

ADD REPLYlink written 2.7 years ago by Michael Love2.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 835 users visited in the last hour