Question: Atypical Volcano plot of RNAseq data
3
gravatar for saamar.rajput
6 months ago by
Germany
saamar.rajput60 wrote:

I performed differential gene expression analysis using EgdeR on RNAseq data and using the DE i generated volcano plot. The volcano plot does not look like a typical volcano plot. I have made volcano plots before and they were normal plots.

I have searched a lot to find the reason why it could be so but unfortunately could not find a good reason. Could someone tell me why this could be?

Rplot

I need to understand why there is a separate line of genes detached from the rest of the volcano plot on the right corner.

Analysis setup:

Organism : Human

Control samples : 3

Infected samples : 3

RUVseq code:

exprs1<- K2_T2
filter<-apply(exprs1,1,function(x) length(x[x>5])>=2) 
filtered<-exprs1[filter,]
head(filtered)
x1<-as.factor(rep(c("M","T2"), each=3))
colnames(exprs1) <- c("K1", "K2", "K3", "T2_1", "T2_2","T2_3") 
set<-newSeqExpressionSet(as.matrix(filtered),phenoData=data.frame(x1, row.names=colnames(filtered)))
head(set)
colors<-brewer.pal(3, "Set1")
plotRLE(set, outline=FALSE, ylim=c(-10,4),col=colors[x1], main     = "Relative Log Expression K/T2")
plotPCA(set, col=colors[x1], cex=1.2, main = "Principal Components Plot  K/T2")
controls <- rownames(set)
differences <- matrix(data=c(1:3, 4:6), byrow=TRUE, nrow=2)
seqRUVs <- RUVs(set, controls, k=1, differences)
plotRLE(seqRUVs, outline=FALSE, ylim=c(-10,10),col=colors[x1],main     = "Relative Log Expression  K/T2")
plotPCA(seqRUVs, col=colors[x1], cex=1.2,main = "Principal Components Plot  K/T2")
design<-model.matrix(~x1, data=pData(seqRUVs))

EdgeR code:

y <- DGEList(counts=counts(seqRUVs), group=x1)
y <- calcNormFactors(y, method="upperquartile")
y <- estimateGLMCommonDisp(y, design)
y <- estimateGLMTagwiseDisp(y, design)
dispersion.true <- 0.2
fit <- glmFit(y, design,dispersion = dispersion.true)
lrt <- glmLRT(fit, coef=2)
topTags(lrt)
row.names(lrt)=gsub("\\..*","",row.names(lrt))         #to remove decimals after the ENSEMBL IDs
lrt$table$Symbol <- mapIds(org.Hs.eg.db,keys=row.names(lrt),column="SYMBOL",keytype="ENSEMBL",multiVals="first")
lrt$table$EntrezID <- mapIds(org.Hs.eg.db,keys=row.names(lrt),column="ENTREZID",keytype="ENSEMBL",multiVals="first")
lrt$table$Uniprot <- mapIds(org.Hs.eg.db,keys=row.names(lrt),column="UNIPROT",keytype="ENSEMBL",multiVals="first")
result<-topTags(lrt,n=nrow(lrt), adjust.method="BH")

mart <- useEnsembl(biomart = "ensembl", 
                   dataset = "hsapiens_gene_ensembl", 
                   mirror = "useast")
IDs<-getBM(attributes=c("ensembl_gene_id","entrezgene_id"), "ensembl_gene_id",row.names(result$table) , mart)
head(IDs)
IDs <-as.data.frame(IDs[!duplicated(IDs$ensembl_gene_id),])
rownames(IDs) <- IDs[,1]
IDs$ensembl_gene_id<-NULL


Check<-merge(result$table, IDs, by=0)
head(Check)
result<-Check[order(Check$FDR, decreasing = FALSE), ]
head(result)

Volcano plot:

library(dplyr)
library(ggplot2)
library("ggrepel")

res<-cbind(result$Symbol,result$logFC, result$PValue,result$FDR)
res<-as.data.frame(res)
head(res)
colnames(res) <- c("Gene", "Fold", "pvalue", "FDR")
res$Fold <-as.numeric(as.character(res[,2]))
res$pvalue <-as.numeric(as.character(res[,3]))
res$FDR <-as.numeric(as.character(res[,4]))
head(res)
str(res)
nrow(res)
res<-res[complete.cases(res), ]
results<-res
results = mutate(res, sig=ifelse(res$FDR<0.05, "FDR<0.05", "Not Sig"))
head(results)


p = ggplot(results, aes(Fold, -log10(FDR))) +geom_point(aes(col=sig)) +scale_color_manual(values=c("red", "black"))
p
rna-seq R • 523 views
ADD COMMENTlink modified 6 months ago • written 6 months ago by saamar.rajput60

Are these nominal or corrected p-values ?

ADD REPLYlink written 6 months ago by Nicolas Rosewick8.7k

These are corrected p-values

ADD REPLYlink written 6 months ago by saamar.rajput60

IMO that's why you have this "line" of separate genes appears. Could you try your volcano plot with nominal p-values. FDR can still be used for coloring.

ADD REPLYlink written 6 months ago by Nicolas Rosewick8.7k

Does not change the plot.

ADD REPLYlink written 6 months ago by saamar.rajput60

Ok then you should add more information on how you generate the plot then (e.g. egedR command, how many samples, experimental desig, etc... ) in order that we can help you.

ADD REPLYlink written 6 months ago by Nicolas Rosewick8.7k

More details about the code added!

ADD REPLYlink written 6 months ago by saamar.rajput60

saamar, if you want more help, provide more information.

ADD REPLYlink written 6 months ago by Kevin Blighe56k

Could you identify the reason now?

ADD REPLYlink written 6 months ago by saamar.rajput60

Thanks! Are you sure that it is the same when you plot the -log10(pvalue)? Did you filter for low counts (i.e. filter them out of your dataset) prior to normalisation?

How did your PCA bi-plots appear?

ADD REPLYlink modified 6 months ago • written 6 months ago by Kevin Blighe56k

Yes it comes out the same plot when I use -log10(pvalue), I filtered out low counts. The PCA plot is nice, controls clustered together and infected clustered together.

ADD REPLYlink written 6 months ago by saamar.rajput60

Use the following and it will improve:

y <- calcNormFactors(y, method="TMM")
ADD REPLYlink written 6 months ago by Kevin Blighe56k

The volcano plot comes out to be same like before.

ADD REPLYlink written 6 months ago by saamar.rajput60
1

Thank you. Well, this back and forth is getting us nowhere. Please make a sample dataset available so that I or somebody else can test it.

ADD REPLYlink written 6 months ago by Kevin Blighe56k

how do I provide you with the table of my data?

ADD REPLYlink written 6 months ago by saamar.rajput60

Now you should also reply to Benn

ADD REPLYlink modified 6 months ago • written 6 months ago by Kevin Blighe56k
1

...and you never replied to grant.

ADD REPLYlink written 6 months ago by Kevin Blighe56k

What is your organism? Are you sure you did not apply any filtering on fold-change? Basically you have a set of (I bet lowly expressed, or probably even co-expressed) genes which have a really big fold change. You can filter your data with p-val and and fold-change to see what are these outliers, maybe it will give you some clues. And actually you have (at least) two of these outliers, there is another line closer to the main trunk of the volcano plot.

ADD REPLYlink modified 6 months ago • written 6 months ago by grant.hovhannisyan1.9k
1
gravatar for Benn
6 months ago by
Benn7.9k
Netherlands
Benn7.9k wrote:

What I would do in such case, is to identify which genes are in that weird line. Then see what's is going on with these genes on raw count level. If that seems okay, see what average CPM values they have in the fit, search until you find some anomalies.

ADD COMMENTlink written 6 months ago by Benn7.9k

Thank you Benn, this is exactly what I am doing right now. Will get back to you if I couldn't find anything.

ADD REPLYlink written 6 months ago by saamar.rajput60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1832 users visited in the last hour