Question

How to just add some selected gene names to a volcano plot

1

Entering edit mode

7.5 years ago

Farbod ★ 3.4k

Dear Biostars, Hi

I have a volcano plot (obtained from edgeR). I also have some selected annotated genes that I like to highlight them by showing only their name on that plot.

I have used the valuable script/code from Biostars (thank you @WouterDeCoster and @venu and others).

As most of the lines of the first column in my counts.matrix is empty (I have only about 15 names), I received some errors. For example, in the "Head" example below, I have only PAC1 and Gh as Gene name and the other lines have "space" instead of any Gene name!

would you please check my script and my counts.matrix example and help me in this regard? Thanks

NOTE:

1- Head of my counts.matrix file:

Gene    external_gene_name  logFC   logCPM  P.Value FDR
PAC1    TRINITY_c0_g2_i4    -13.8732718193368   4.61855606743036    3.03811154383156e-30    1.87897780840196e-24
    TRINITY_c5_g3_i4    13.2697070220115    4.01618042373378    1.21936893870205e-27    3.7707094407506e-22
GH  TRINITY_c9_g2_i2    11.2269211011449    3.82178000481344    3.20153046019011e-26    6.60015780727773e-21
    TRINITY_c0_g1_i1    12.855465908739 3.60335598332695    2.98018910927419e-24    4.60788644555924e-19
    TRINITY_c0_g1_i1    -11.8727923903381   2.62735890209779    1.81499535997098e-20    2.24503673057178e-15
    TRINITY_c6_g1_i6    10.3243429332541    2.89490936855587    2.16106621247948e-19    2.22758743227662e-14
    TRINITY_g3_i2   -12.0236467730875   2.77706530918009    2.63166646685828e-19    2.32514875441625e-14
    TRINITY_c3_g2_i6    -11.2400737828536   2.00142608642608    1.40238473598696e-18    1.0841643566014e-13     
    TRINITY_c1_g1_i1    -9.45052158373113   2.158090406858  1.66695714263313e-18    1.14551257449686e-13

2- my code:

    # Load packages
    library(dplyr)
    library(ggplot2)
    library(ggrepel)

    # Read data 
    data <- read.table("trans.counts.matrix.J_vs_M.edgeR.DE_results", header=TRUE)

    data$threshold = as.factor(data$FDR < 0.001)


    data = mutate(data, sig=ifelse(data$FDR<0.001, "FDR<0.001", "Not Sig"))

    p = ggplot(data, aes(logFC, -log10(P.Value))) +
      geom_point(aes(col=sig)) +
      scale_color_manual(values=c("red", "black"))

    p
  p+geom_text_repel(data=filter(data, FDR<0.001), aes(label=Gene))

R ggplot2 volcano plot • 11k views

ADD COMMENT • link updated 6.4 years ago by zx8754 12k • written 7.5 years ago by Farbod ★ 3.4k

1

Entering edit mode

Add another filter to filter out genes (column) without name some thing like df$gene!="" in addition to p-value cutoff. This would show only those genes with names and with FDR below the cutoff.

Data has significant FDR values and significant genes are filtered by FDR (from OP code). Hence it is Advisable to plot FC against FDR instead of -log10P (IMO).
From OP' data (OP filtered data by 0.05 FDR, code filters the data by mean (FDR)):

test = read.csv(   "test.txt",   stringsAsFactors = F,   sep = "\t",  strip.white = T ) 
test2 = test[test$FDR < mean (test$FDR) & test$Gene != "",] 
library(dplyr) 
library(ggplot2)

ggplot(test, aes(logFC, FDR, color = FDR < mean(FDR))) +  
geom_point(size = 4) +   
scale_color_manual(values = c("blue", "red", "darkgreen")) +   
geom_text( data = test2,
     aes (logFC, FDR, label = Gene, size = 4, colour = ""),  vjust = -1.2, hjust = "center") +   
theme_bw() +   
theme(legend.position = "none")

ADD REPLY • link 7.5 years ago by cpad0112 21k

3

Entering edit mode

7.2 years ago

Kevin Blighe 89k

You can do this with my new Bioconductor package: EnhancedVolcano: Publication-ready volcano plots with enhanced colouring and labeling

Specifically take a look at Section 4.8 Only label key transcripts of the vignette.

EnhancedVolcano(res2,
        lab = rownames(res2),
        x = "log2FoldChange",
        y = "padj",
        selectLab = c("ENSG00000106565","ENSG00000187758"),
        ... ...
)

Kevin

ADD COMMENT • link 7.2 years ago by Kevin Blighe 89k

score 2 · Accepted Answer · 2018-05-12

2

Entering edit mode

7.5 years ago

venu 7.1k

Change this line of code

data <- read.table("trans.counts.matrix.J_vs_M.edgeR.DE_results", header=TRUE)

to

data <- read.delim("trans.counts.matrix.J_vs_M.edgeR.DE_results", header=TRUE, na.strings=c("","NA"))

This adds <NA> to empty cells which will be ignored in the plot and only gene names will be added.

ADD COMMENT • link 7.5 years ago by venu 7.1k

0

Entering edit mode

Dear @venu, Hi. Great answer! Thank you.

ADD REPLY • link 7.5 years ago by Farbod ★ 3.4k

0

Entering edit mode

Another problem;

I have used "library(ggrepel)" and " p+geom_text_repel(data= . . . " in the code but the name of each gene is just on it's spot without any "arrow", what must I do ? as the names are on each dot and unreadable.