Question: Need help correcting overlappinge genes on valcano plot
0
gravatar for Adeler001
13 months ago by
Adeler0010
Adeler0010 wrote:

I'm doing an RNA seq analysis and I'm trying to show my results using a volcano plot on R studio. I used the the follow script to make my volcano plot on R studio:

#Import count table#
countdata <- read.table("family1301RNA-seq.countsfixed.txt", header=TRUE, row.names=1)

#Convert to matrix#
countdata <- as.matrix(countdata) head(countdata)

#Assign condition (first four are controls, second four and third four contain two different experiments)#
condition<-factor(c("unaffected","unaffected","unaffected","affected","affected","affected"),levels=c("unaffected","affected")) subject <- factor(c("1","2","3","3","2","1"))

library(DESeq2)

#Create a coldata frame and instantiate the DESeqDataSet#
coldata <- data.frame(row.names=colnames(countdata), subject, condition)

dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~ subject + condition) dds

#pre-filtering to keep only rows that have at least 1 reads total#
keep <- rowSums(counts(dds)) > 1 dds <- dds[keep,]

#Run the DESeq#
dds <- DESeq(dds)

#Regularized log transformation for clustering/heatmaps#
rld <- rlogTransformation(dds) head(assay(rld)) hist(assay(rld))

#Colors for plots below#
library(RColorBrewer) (mycols <- brewer.pal(8, "Dark2")[1:length(unique(condition))])

#Sample distance heatmap#
sampleDists <- as.matrix(dist(t(assay(rld)))) library(gplots) png("qc-heatmap_baker.png", w=1000, h=1000, pointsize=20) heatmap.2(as.matrix(sampleDists), key=F, trace="none", col=colorpanel(100, "black", "white"), ColSideColors=mycols[condition], RowSideColors=mycols[condition], margin=c(10, 10), main="Sample Distance Matrix") dev.off()

#Principal components analysis#
rldpca <- function (rld, intgroup = "condition", ntop = 500, colors=NULL, legendpos="bottomleft", main="PCA Biplot", textcx=1, ...) { require(genefilter) require(calibrate) require(RColorBrewer) rv = rowVars(assay(rld)) select = order(rv, decreasing = TRUE)[seqlen(min(ntop, length(rv)))] pca = prcomp(t(assay(rld)[select, ])) fac = factor(apply(as.data.frame(colData(rld)[, intgroup, drop = FALSE]), 1, paste, collapse = " : ")) if (is.null(colors)) { if (nlevels(fac) >= 3) { colors = brewer.pal(nlevels(fac), "Paired") } else { colors = c("black", "red") } } pc1var <- round(summary(pca)$importance[2,1]100, digits=1) pc2var <- round(summary(pca)$importance[2,2]100, digits=1) pc1lab <- paste0("PC1 (",as.character(pc1var),"%)") pc2lab <- paste0("PC2 (",as.character(pc2var),"%)") plot(PC2~PC1, data=as.data.frame(pca$x), bg=colors[fac], pch=21, xlab=pc1lab, ylab=pc2lab, main=main, ...) with(as.data.frame(pca$x), textxy(PC1, PC2, labs=rownames(as.data.frame(pca$x)), cex=textcx)) legend(legendpos, legend=levels(fac), col=colors, pch=20)

png("qc-pca.png", 1000, 1000, pointsize=20) rld_pca(rld, colors=mycols, intgroup="condition", xlim=c(-20, 20)) dev.off()

#Get differential expression results#
res <- results(dds) table(res$padj<0.05)

#Order by adjusted p-value#
res <- res[order(res$padj), ]

#Merge with normalized count data#
resdata <- merge(as.data.frame(res), as.data.frame(counts(dds, normalized=TRUE)), by="row.names", sort=FALSE) names(resdata)[1] <- "Gene" head(resdata)

#get significant results (FDR<0.05)
Write results#
write.csv(resdata, file="sig_diffexpr-results.csv")

#Volcano plot with significant DE genes#
volcanoplot <- function (res, lfcthresh=2, sigthresh=0.05, main="Volcano Plot", legendpos="bottomright", labelsig=TRUE, textcx=1, ...) { with(res, plot(log2FoldChange, -log10(padj), pch=20, main=main, ...)) with(subset(res, padj<sigthresh ),="" points(log2foldchange,="" -log10(padj),="" pch="20," col="red" ,="" ...))="" with(subset(res,="" abs(log2foldchange)&gt;lfcthresh),="" points(log2foldchange,="" -log10(padj),="" pch="20," col="orange" ,="" ...))="" with(subset(res,="" padj<sigthresh="" &amp;="" abs(log2foldchange)&gt;lfcthresh),="" points(log2foldchange,="" -log10(padj),="" pch="20," col="green" ,="" ...))="" if="" (labelsig)="" {="" require(calibrate)="" with(subset(res,="" padj<sigthresh="" &amp;="" abs(log2foldchange)&gt;lfcthresh),="" textxy(log2foldchange,="" -log10(padj),="" labs="Gene," cex="textcx," ...))="" }="" legend(legendpos,="" xjust="1," yjust="1," legend="c(paste("FDR&lt;",sigthresh,sep="")," paste("|logfc|&gt;",lfcthresh,sep="" ),="" "both"),="" pch="20," col="c("red","orange","green"))" }="" png("diffexpr-volcanoplot.png",="" 1200,="" 1000,="" pointsize="20)" volcanoplot(resdata,="" lfcthresh="1," sigthresh="0.05," textcx=".8," xlim="c(-3," 3))="" dev.off()<="" p="">

I am novice R studio user , the issue I'm having is that the gene name labels displayed on my Volcano plot overlap, making them unreadable how can I prevent this overlap of the gene labels?

rna-seq • 551 views
ADD COMMENTlink modified 13 months ago by jared.andrews077.2k • written 13 months ago by Adeler0010

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Thank you!

ADD REPLYlink written 13 months ago by genomax90k

Hello Adeler001!

It appears that your post has been cross-posted to another site: https://support.bioconductor.org/p/124173/

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLYlink written 13 months ago by ATpoint39k
1
gravatar for jared.andrews07
13 months ago by
jared.andrews077.2k
Memphis, TN
jared.andrews077.2k wrote:

Rather than try to parse that plotting function, I'm just going to recommend EnhancedVolcano. It will take care of the label overlap and drastically simplify plotting for you.

ADD COMMENTlink written 13 months ago by jared.andrews077.2k

ok thanks ill try that

ADD REPLYlink written 12 months ago by Adeler0010
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1631 users visited in the last hour