Atypical Volcano plot of RNAseq data
1
3
Entering edit mode
4.6 years ago

I performed differential gene expression analysis using EgdeR on RNAseq data and using the DE i generated volcano plot. The volcano plot does not look like a typical volcano plot. I have made volcano plots before and they were normal plots.

I have searched a lot to find the reason why it could be so but unfortunately could not find a good reason. Could someone tell me why this could be?

Rplot

I need to understand why there is a separate line of genes detached from the rest of the volcano plot on the right corner.

Analysis setup:

Organism : Human

Control samples : 3

Infected samples : 3

RUVseq code:

exprs1<- K2_T2
filter<-apply(exprs1,1,function(x) length(x[x>5])>=2) 
filtered<-exprs1[filter,]
head(filtered)
x1<-as.factor(rep(c("M","T2"), each=3))
colnames(exprs1) <- c("K1", "K2", "K3", "T2_1", "T2_2","T2_3") 
set<-newSeqExpressionSet(as.matrix(filtered),phenoData=data.frame(x1, row.names=colnames(filtered)))
head(set)
colors<-brewer.pal(3, "Set1")
plotRLE(set, outline=FALSE, ylim=c(-10,4),col=colors[x1], main     = "Relative Log Expression K/T2")
plotPCA(set, col=colors[x1], cex=1.2, main = "Principal Components Plot  K/T2")
controls <- rownames(set)
differences <- matrix(data=c(1:3, 4:6), byrow=TRUE, nrow=2)
seqRUVs <- RUVs(set, controls, k=1, differences)
plotRLE(seqRUVs, outline=FALSE, ylim=c(-10,10),col=colors[x1],main     = "Relative Log Expression  K/T2")
plotPCA(seqRUVs, col=colors[x1], cex=1.2,main = "Principal Components Plot  K/T2")
design<-model.matrix(~x1, data=pData(seqRUVs))

EdgeR code:

y <- DGEList(counts=counts(seqRUVs), group=x1)
y <- calcNormFactors(y, method="upperquartile")
y <- estimateGLMCommonDisp(y, design)
y <- estimateGLMTagwiseDisp(y, design)
dispersion.true <- 0.2
fit <- glmFit(y, design,dispersion = dispersion.true)
lrt <- glmLRT(fit, coef=2)
topTags(lrt)
row.names(lrt)=gsub("\\..*","",row.names(lrt))         #to remove decimals after the ENSEMBL IDs
lrt$table$Symbol <- mapIds(org.Hs.eg.db,keys=row.names(lrt),column="SYMBOL",keytype="ENSEMBL",multiVals="first")
lrt$table$EntrezID <- mapIds(org.Hs.eg.db,keys=row.names(lrt),column="ENTREZID",keytype="ENSEMBL",multiVals="first")
lrt$table$Uniprot <- mapIds(org.Hs.eg.db,keys=row.names(lrt),column="UNIPROT",keytype="ENSEMBL",multiVals="first")
result<-topTags(lrt,n=nrow(lrt), adjust.method="BH")

mart <- useEnsembl(biomart = "ensembl", 
                   dataset = "hsapiens_gene_ensembl", 
                   mirror = "useast")
IDs<-getBM(attributes=c("ensembl_gene_id","entrezgene_id"), "ensembl_gene_id",row.names(result$table) , mart)
head(IDs)
IDs <-as.data.frame(IDs[!duplicated(IDs$ensembl_gene_id),])
rownames(IDs) <- IDs[,1]
IDs$ensembl_gene_id<-NULL


Check<-merge(result$table, IDs, by=0)
head(Check)
result<-Check[order(Check$FDR, decreasing = FALSE), ]
head(result)

Volcano plot:

library(dplyr)
library(ggplot2)
library("ggrepel")

res<-cbind(result$Symbol,result$logFC, result$PValue,result$FDR)
res<-as.data.frame(res)
head(res)
colnames(res) <- c("Gene", "Fold", "pvalue", "FDR")
res$Fold <-as.numeric(as.character(res[,2]))
res$pvalue <-as.numeric(as.character(res[,3]))
res$FDR <-as.numeric(as.character(res[,4]))
head(res)
str(res)
nrow(res)
res<-res[complete.cases(res), ]
results<-res
results = mutate(res, sig=ifelse(res$FDR<0.05, "FDR<0.05", "Not Sig"))
head(results)


p = ggplot(results, aes(Fold, -log10(FDR))) +geom_point(aes(col=sig)) +scale_color_manual(values=c("red", "black"))
p
R RNA-Seq • 4.0k views
ADD COMMENT
0
Entering edit mode

Are these nominal or corrected p-values ?

ADD REPLY
0
Entering edit mode

These are corrected p-values

ADD REPLY
0
Entering edit mode

IMO that's why you have this "line" of separate genes appears. Could you try your volcano plot with nominal p-values. FDR can still be used for coloring.

ADD REPLY
0
Entering edit mode

Does not change the plot.

ADD REPLY
0
Entering edit mode

Ok then you should add more information on how you generate the plot then (e.g. egedR command, how many samples, experimental desig, etc... ) in order that we can help you.

ADD REPLY
0
Entering edit mode

More details about the code added!

ADD REPLY
0
Entering edit mode

What is your organism? Are you sure you did not apply any filtering on fold-change? Basically you have a set of (I bet lowly expressed, or probably even co-expressed) genes which have a really big fold change. You can filter your data with p-val and and fold-change to see what are these outliers, maybe it will give you some clues. And actually you have (at least) two of these outliers, there is another line closer to the main trunk of the volcano plot.

ADD REPLY
1
Entering edit mode
4.6 years ago
Benn 8.3k

What I would do in such case, is to identify which genes are in that weird line. Then see what's is going on with these genes on raw count level. If that seems okay, see what average CPM values they have in the fit, search until you find some anomalies.

ADD COMMENT
0
Entering edit mode

Thank you Benn, this is exactly what I am doing right now. Will get back to you if I couldn't find anything.

ADD REPLY

Login before adding your answer.

Traffic: 2132 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6