Question: Atypical Volcano plot of RNAseq data
3
gravatar for saamar.rajput
14 months ago by
Germany
saamar.rajput60 wrote:

I performed differential gene expression analysis using EgdeR on RNAseq data and using the DE i generated volcano plot. The volcano plot does not look like a typical volcano plot. I have made volcano plots before and they were normal plots.

I have searched a lot to find the reason why it could be so but unfortunately could not find a good reason. Could someone tell me why this could be?

Rplot

I need to understand why there is a separate line of genes detached from the rest of the volcano plot on the right corner.

Analysis setup:

Organism : Human

Control samples : 3

Infected samples : 3

RUVseq code:

exprs1<- K2_T2
filter<-apply(exprs1,1,function(x) length(x[x>5])>=2) 
filtered<-exprs1[filter,]
head(filtered)
x1<-as.factor(rep(c("M","T2"), each=3))
colnames(exprs1) <- c("K1", "K2", "K3", "T2_1", "T2_2","T2_3") 
set<-newSeqExpressionSet(as.matrix(filtered),phenoData=data.frame(x1, row.names=colnames(filtered)))
head(set)
colors<-brewer.pal(3, "Set1")
plotRLE(set, outline=FALSE, ylim=c(-10,4),col=colors[x1], main     = "Relative Log Expression K/T2")
plotPCA(set, col=colors[x1], cex=1.2, main = "Principal Components Plot  K/T2")
controls <- rownames(set)
differences <- matrix(data=c(1:3, 4:6), byrow=TRUE, nrow=2)
seqRUVs <- RUVs(set, controls, k=1, differences)
plotRLE(seqRUVs, outline=FALSE, ylim=c(-10,10),col=colors[x1],main     = "Relative Log Expression  K/T2")
plotPCA(seqRUVs, col=colors[x1], cex=1.2,main = "Principal Components Plot  K/T2")
design<-model.matrix(~x1, data=pData(seqRUVs))

EdgeR code:

y <- DGEList(counts=counts(seqRUVs), group=x1)
y <- calcNormFactors(y, method="upperquartile")
y <- estimateGLMCommonDisp(y, design)
y <- estimateGLMTagwiseDisp(y, design)
dispersion.true <- 0.2
fit <- glmFit(y, design,dispersion = dispersion.true)
lrt <- glmLRT(fit, coef=2)
topTags(lrt)
row.names(lrt)=gsub("\\..*","",row.names(lrt))         #to remove decimals after the ENSEMBL IDs
lrt$table$Symbol <- mapIds(org.Hs.eg.db,keys=row.names(lrt),column="SYMBOL",keytype="ENSEMBL",multiVals="first")
lrt$table$EntrezID <- mapIds(org.Hs.eg.db,keys=row.names(lrt),column="ENTREZID",keytype="ENSEMBL",multiVals="first")
lrt$table$Uniprot <- mapIds(org.Hs.eg.db,keys=row.names(lrt),column="UNIPROT",keytype="ENSEMBL",multiVals="first")
result<-topTags(lrt,n=nrow(lrt), adjust.method="BH")

mart <- useEnsembl(biomart = "ensembl", 
                   dataset = "hsapiens_gene_ensembl", 
                   mirror = "useast")
IDs<-getBM(attributes=c("ensembl_gene_id","entrezgene_id"), "ensembl_gene_id",row.names(result$table) , mart)
head(IDs)
IDs <-as.data.frame(IDs[!duplicated(IDs$ensembl_gene_id),])
rownames(IDs) <- IDs[,1]
IDs$ensembl_gene_id<-NULL


Check<-merge(result$table, IDs, by=0)
head(Check)
result<-Check[order(Check$FDR, decreasing = FALSE), ]
head(result)

Volcano plot:

library(dplyr)
library(ggplot2)
library("ggrepel")

res<-cbind(result$Symbol,result$logFC, result$PValue,result$FDR)
res<-as.data.frame(res)
head(res)
colnames(res) <- c("Gene", "Fold", "pvalue", "FDR")
res$Fold <-as.numeric(as.character(res[,2]))
res$pvalue <-as.numeric(as.character(res[,3]))
res$FDR <-as.numeric(as.character(res[,4]))
head(res)
str(res)
nrow(res)
res<-res[complete.cases(res), ]
results<-res
results = mutate(res, sig=ifelse(res$FDR<0.05, "FDR<0.05", "Not Sig"))
head(results)


p = ggplot(results, aes(Fold, -log10(FDR))) +geom_point(aes(col=sig)) +scale_color_manual(values=c("red", "black"))
p
rna-seq R • 1.0k views
ADD COMMENTlink modified 14 months ago • written 14 months ago by saamar.rajput60

Are these nominal or corrected p-values ?

ADD REPLYlink written 14 months ago by Nicolas Rosewick9.2k

These are corrected p-values

ADD REPLYlink written 14 months ago by saamar.rajput60

IMO that's why you have this "line" of separate genes appears. Could you try your volcano plot with nominal p-values. FDR can still be used for coloring.

ADD REPLYlink written 14 months ago by Nicolas Rosewick9.2k

Does not change the plot.

ADD REPLYlink written 14 months ago by saamar.rajput60

Ok then you should add more information on how you generate the plot then (e.g. egedR command, how many samples, experimental desig, etc... ) in order that we can help you.

ADD REPLYlink written 14 months ago by Nicolas Rosewick9.2k

More details about the code added!

ADD REPLYlink written 14 months ago by saamar.rajput60

What is your organism? Are you sure you did not apply any filtering on fold-change? Basically you have a set of (I bet lowly expressed, or probably even co-expressed) genes which have a really big fold change. You can filter your data with p-val and and fold-change to see what are these outliers, maybe it will give you some clues. And actually you have (at least) two of these outliers, there is another line closer to the main trunk of the volcano plot.

ADD REPLYlink modified 14 months ago • written 14 months ago by grant.hovhannisyan2.0k
1
gravatar for Benn
14 months ago by
Benn8.0k
Netherlands
Benn8.0k wrote:

What I would do in such case, is to identify which genes are in that weird line. Then see what's is going on with these genes on raw count level. If that seems okay, see what average CPM values they have in the fit, search until you find some anomalies.

ADD COMMENTlink written 14 months ago by Benn8.0k

Thank you Benn, this is exactly what I am doing right now. Will get back to you if I couldn't find anything.

ADD REPLYlink written 14 months ago by saamar.rajput60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1649 users visited in the last hour