Dear Biostars, Hi
I have a volcano plot (obtained from edgeR). I also have some selected annotated genes that I like to highlight them by showing only their name on that plot.
I have used the valuable script/code from Biostars (thank you @WouterDeCoster and @venu and others).
As most of the lines of the first column in my counts.matrix is empty (I have only about 15 names), I received some errors. For example, in the "Head" example below, I have only PAC1 and Gh as Gene name and the other lines have "space" instead of any Gene name!
would you please check my script and my counts.matrix example and help me in this regard? Thanks
NOTE:
1- Head of my counts.matrix file:
Gene    external_gene_name  logFC   logCPM  P.Value FDR
PAC1    TRINITY_c0_g2_i4    -13.8732718193368   4.61855606743036    3.03811154383156e-30    1.87897780840196e-24
    TRINITY_c5_g3_i4    13.2697070220115    4.01618042373378    1.21936893870205e-27    3.7707094407506e-22
GH  TRINITY_c9_g2_i2    11.2269211011449    3.82178000481344    3.20153046019011e-26    6.60015780727773e-21
    TRINITY_c0_g1_i1    12.855465908739 3.60335598332695    2.98018910927419e-24    4.60788644555924e-19
    TRINITY_c0_g1_i1    -11.8727923903381   2.62735890209779    1.81499535997098e-20    2.24503673057178e-15
    TRINITY_c6_g1_i6    10.3243429332541    2.89490936855587    2.16106621247948e-19    2.22758743227662e-14
    TRINITY_g3_i2   -12.0236467730875   2.77706530918009    2.63166646685828e-19    2.32514875441625e-14
    TRINITY_c3_g2_i6    -11.2400737828536   2.00142608642608    1.40238473598696e-18    1.0841643566014e-13     
    TRINITY_c1_g1_i1    -9.45052158373113   2.158090406858  1.66695714263313e-18    1.14551257449686e-13
2- my code:
    # Load packages
    library(dplyr)
    library(ggplot2)
    library(ggrepel)
    # Read data 
    data <- read.table("trans.counts.matrix.J_vs_M.edgeR.DE_results", header=TRUE)
    data$threshold = as.factor(data$FDR < 0.001)
    data = mutate(data, sig=ifelse(data$FDR<0.001, "FDR<0.001", "Not Sig"))
    p = ggplot(data, aes(logFC, -log10(P.Value))) +
      geom_point(aes(col=sig)) +
      scale_color_manual(values=c("red", "black"))
    p
  p+geom_text_repel(data=filter(data, FDR<0.001), aes(label=Gene))
                    
                
                
Add another filter to filter out genes (column) without name some thing like df$gene!="" in addition to p-value cutoff. This would show only those genes with names and with FDR below the cutoff.