Question: Atypical Volcano plot of RNAseq data
3
gravatar for saamar.rajput
5 weeks ago by
Germany
saamar.rajput50 wrote:

I performed differential gene expression analysis using EgdeR on RNAseq data and using the DE i generated volcano plot. The volcano plot does not look like a typical volcano plot. I have made volcano plots before and they were normal plots.

I have searched a lot to find the reason why it could be so but unfortunately could not find a good reason. Could someone tell me why this could be?

Rplot

I need to understand why there is a separate line of genes detached from the rest of the volcano plot on the right corner.

Analysis setup:

Organism : Human

Control samples : 3

Infected samples : 3

RUVseq code:

exprs1<- K2_T2
filter<-apply(exprs1,1,function(x) length(x[x>5])>=2) 
filtered<-exprs1[filter,]
head(filtered)
x1<-as.factor(rep(c("M","T2"), each=3))
colnames(exprs1) <- c("K1", "K2", "K3", "T2_1", "T2_2","T2_3") 
set<-newSeqExpressionSet(as.matrix(filtered),phenoData=data.frame(x1, row.names=colnames(filtered)))
head(set)
colors<-brewer.pal(3, "Set1")
plotRLE(set, outline=FALSE, ylim=c(-10,4),col=colors[x1], main     = "Relative Log Expression K/T2")
plotPCA(set, col=colors[x1], cex=1.2, main = "Principal Components Plot  K/T2")
controls <- rownames(set)
differences <- matrix(data=c(1:3, 4:6), byrow=TRUE, nrow=2)
seqRUVs <- RUVs(set, controls, k=1, differences)
plotRLE(seqRUVs, outline=FALSE, ylim=c(-10,10),col=colors[x1],main     = "Relative Log Expression  K/T2")
plotPCA(seqRUVs, col=colors[x1], cex=1.2,main = "Principal Components Plot  K/T2")
design<-model.matrix(~x1, data=pData(seqRUVs))

EdgeR code:

y <- DGEList(counts=counts(seqRUVs), group=x1)
y <- calcNormFactors(y, method="upperquartile")
y <- estimateGLMCommonDisp(y, design)
y <- estimateGLMTagwiseDisp(y, design)
dispersion.true <- 0.2
fit <- glmFit(y, design,dispersion = dispersion.true)
lrt <- glmLRT(fit, coef=2)
topTags(lrt)
row.names(lrt)=gsub("\\..*","",row.names(lrt))         #to remove decimals after the ENSEMBL IDs
lrt$table$Symbol <- mapIds(org.Hs.eg.db,keys=row.names(lrt),column="SYMBOL",keytype="ENSEMBL",multiVals="first")
lrt$table$EntrezID <- mapIds(org.Hs.eg.db,keys=row.names(lrt),column="ENTREZID",keytype="ENSEMBL",multiVals="first")
lrt$table$Uniprot <- mapIds(org.Hs.eg.db,keys=row.names(lrt),column="UNIPROT",keytype="ENSEMBL",multiVals="first")
result<-topTags(lrt,n=nrow(lrt), adjust.method="BH")

mart <- useEnsembl(biomart = "ensembl", 
                   dataset = "hsapiens_gene_ensembl", 
                   mirror = "useast")
IDs<-getBM(attributes=c("ensembl_gene_id","entrezgene_id"), "ensembl_gene_id",row.names(result$table) , mart)
head(IDs)
IDs <-as.data.frame(IDs[!duplicated(IDs$ensembl_gene_id),])
rownames(IDs) <- IDs[,1]
IDs$ensembl_gene_id<-NULL


Check<-merge(result$table, IDs, by=0)
head(Check)
result<-Check[order(Check$FDR, decreasing = FALSE), ]
head(result)

Volcano plot:

library(dplyr)
library(ggplot2)
library("ggrepel")

res<-cbind(result$Symbol,result$logFC, result$PValue,result$FDR)
res<-as.data.frame(res)
head(res)
colnames(res) <- c("Gene", "Fold", "pvalue", "FDR")
res$Fold <-as.numeric(as.character(res[,2]))
res$pvalue <-as.numeric(as.character(res[,3]))
res$FDR <-as.numeric(as.character(res[,4]))
head(res)
str(res)
nrow(res)
res<-res[complete.cases(res), ]
results<-res
results = mutate(res, sig=ifelse(res$FDR<0.05, "FDR<0.05", "Not Sig"))
head(results)


p = ggplot(results, aes(Fold, -log10(FDR))) +geom_point(aes(col=sig)) +scale_color_manual(values=c("red", "black"))
p
rna-seq R • 249 views
ADD COMMENTlink modified 28 days ago • written 5 weeks ago by saamar.rajput50

Are these nominal or corrected p-values ?

ADD REPLYlink written 5 weeks ago by Nicolas Rosewick8.3k

These are corrected p-values

ADD REPLYlink written 5 weeks ago by saamar.rajput50

IMO that's why you have this "line" of separate genes appears. Could you try your volcano plot with nominal p-values. FDR can still be used for coloring.

ADD REPLYlink written 5 weeks ago by Nicolas Rosewick8.3k

Does not change the plot.

ADD REPLYlink written 5 weeks ago by saamar.rajput50

Ok then you should add more information on how you generate the plot then (e.g. egedR command, how many samples, experimental desig, etc... ) in order that we can help you.

ADD REPLYlink written 5 weeks ago by Nicolas Rosewick8.3k

More details about the code added!

ADD REPLYlink written 4 weeks ago by saamar.rajput50

saamar, if you want more help, provide more information.

ADD REPLYlink written 5 weeks ago by Kevin Blighe50k

Could you identify the reason now?

ADD REPLYlink written 4 weeks ago by saamar.rajput50

Thanks! Are you sure that it is the same when you plot the -log10(pvalue)? Did you filter for low counts (i.e. filter them out of your dataset) prior to normalisation?

How did your PCA bi-plots appear?

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Kevin Blighe50k

Yes it comes out the same plot when I use -log10(pvalue), I filtered out low counts. The PCA plot is nice, controls clustered together and infected clustered together.

ADD REPLYlink written 4 weeks ago by saamar.rajput50

Use the following and it will improve:

y <- calcNormFactors(y, method="TMM")
ADD REPLYlink written 4 weeks ago by Kevin Blighe50k

The volcano plot comes out to be same like before.

ADD REPLYlink written 28 days ago by saamar.rajput50
1

Thank you. Well, this back and forth is getting us nowhere. Please make a sample dataset available so that I or somebody else can test it.

ADD REPLYlink written 28 days ago by Kevin Blighe50k

how do I provide you with the table of my data?

ADD REPLYlink written 28 days ago by saamar.rajput50

Now you should also reply to Benn

ADD REPLYlink modified 28 days ago • written 28 days ago by Kevin Blighe50k
1

...and you never replied to grant.

ADD REPLYlink written 28 days ago by Kevin Blighe50k

What is your organism? Are you sure you did not apply any filtering on fold-change? Basically you have a set of (I bet lowly expressed, or probably even co-expressed) genes which have a really big fold change. You can filter your data with p-val and and fold-change to see what are these outliers, maybe it will give you some clues. And actually you have (at least) two of these outliers, there is another line closer to the main trunk of the volcano plot.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by grant.hovhannisyan1.8k
1
gravatar for Benn
28 days ago by
Benn7.8k
Netherlands
Benn7.8k wrote:

What I would do in such case, is to identify which genes are in that weird line. Then see what's is going on with these genes on raw count level. If that seems okay, see what average CPM values they have in the fit, search until you find some anomalies.

ADD COMMENTlink written 28 days ago by Benn7.8k

Thank you Benn, this is exactly what I am doing right now. Will get back to you if I couldn't find anything.

ADD REPLYlink written 28 days ago by saamar.rajput50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2253 users visited in the last hour