Question: Producing such a plot in r
0
gravatar for A
27 days ago by
A3.9k
A3.9k wrote:

Hello

I have proportion of samples altered for a list of genes and related p-value like

CNV   - log10_pvalue    Percentage_altered

CDKN2B  Deletion    3   69
CDKN2A  Deletion    3   69
RPL22   Deletion    0.087568    33
GATA6   Amplifiction    2.974694135 44
EGFR    Amplifiction    1.958607315 42
CCND1   Amplifiction    2.999132278 36
CDK6    Amplifiction    2.795880017 30
GATAD1  Amplifiction    2.795880017 30
KRAS    Amplifiction    2.999132278 22
MYB Amplifiction    1.677780705 16
GATA4   Amplifiction    1.091514981 13
MYC Amplifiction    2.22184875  52
CCNE1   Amplifiction    -0.000434077    0
TSHZ3   Amplifiction    -0.000434077    0
ERBB2   Amplifiction    -0.000434077    0

I want to visualise this data like below but I don't know how

enter image description here

Any help please?

R • 147 views
ADD COMMENTlink modified 27 days ago by rpolicastro2.3k • written 27 days ago by A3.9k

It seems to me that is a mix of inverted volcano plot and bubble plot. Two links that can help you to achieve the below as per my experience are below:

1.https://www.r-graph-gallery.com/320-the-basis-of-bubble-plot.html

2.https://www.bioconductor.org/packages/release/bioc/vignettes/EnhancedVolcano/inst/doc/EnhancedVolcano.html

You would definitely need to tweak the code. Is there a GitHub link present from the paper you are referring to? Maybe also digging into that might give some leads.

ADD REPLYlink modified 27 days ago • written 27 days ago by ivivek_ngs5.0k

You are currently missing the variable they used in their y-axis.

ADD REPLYlink written 27 days ago by rpolicastro2.3k

Does not seem like so. The Y-axis here refers to the frequency of gain and deletion %, which in the OP query is the last column (Percentage_altered) if I understand correctly.

ADD REPLYlink modified 27 days ago • written 27 days ago by ivivek_ngs5.0k

edit: I think the y axis and the point size are the same variable, but on the y-axis they functionally make the percentage negative for deletion and positive for gain.

ADD REPLYlink modified 27 days ago • written 27 days ago by rpolicastro2.3k
3
gravatar for rpolicastro
27 days ago by
rpolicastro2.3k
rpolicastro2.3k wrote:

The example data.

df <- structure(list(gene = c("CDKN2B", "CDKN2A", "RPL22", "GATA6", 
"EGFR", "CCND1", "CDK6", "GATAD1", "KRAS", "MYB", "GATA4", "MYC", 
"CCNE1", "TSHZ3", "ERBB2"), CNV = c("Deletion", "Deletion", "Deletion", 
"Amplifiction", "Amplifiction", "Amplifiction", "Amplifiction", 
"Amplifiction", "Amplifiction", "Amplifiction", "Amplifiction", 
"Amplifiction", "Amplifiction", "Amplifiction", "Amplifiction"
), log10_pvalue = c(3, 3, 0.087568, 2.974694135, 1.958607315, 
2.999132278, 2.795880017, 2.795880017, 2.999132278, 1.677780705, 
1.091514981, 2.22184875, -0.000434077, -0.000434077, -0.000434077
), Percentage_altered = c(69L, 69L, 33L, 44L, 42L, 36L, 30L, 
30L, 22L, 16L, 13L, 52L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA, 
-15L))

ggplot2 answer

library("tidyverse")
library("ggrepel")

df %>%
  mutate(net_frequency=ifelse(CNV == "Deletion", -Percentage_altered/100, Percentage_altered/100)) %>%
  ggplot(aes(x=log10_pvalue, y=net_frequency)) +
    geom_point(aes(size=Percentage_altered, color=log10_pvalue)) +
    geom_text_repel(aes(label=ifelse(log10_pvalue > -log10(0.05), gene, "")), force=10) +
    geom_hline(yintercept=0, lty=2) +
    theme_classic()

enter image description here

ADD COMMENTlink modified 27 days ago • written 27 days ago by rpolicastro2.3k

Thank you so much

How I can put gene name on the corresponding bubble please?

ADD REPLYlink modified 27 days ago • written 27 days ago by A3.9k

I edited the post to include the gene names for genes with a p-value < 0.05.

ADD REPLYlink written 27 days ago by rpolicastro2.3k

Sorry this is my full data

gene    CNV -log10_pvalue   Percentage_altered
CDKN2B  Deletion    2.72E+01    69
CDKN2A  Deletion    2.72E+01    69
RPL22   Deletion    1.057654569 36
GATA6   Amplification   4.22184875  42
EGFR    Amplification   2   34
CCND1   Amplification   5.698970004 32
CDK6    Amplification   3.22184875  24
GATAD1  Amplification   3.22184875  24
KRAS    Amplification   5.698970004 24
MYB Amplification   1.698970004 16
GATA4   Amplification   1.096910013 16
MYC Amplification   2.22184875  52
CCNE1   Amplification   0   0
TSHZ3   Amplification   0   0
ERBB2   Amplification   0   0

CCNE1, TSHZ3 and ERBB2 are all zero percent therefore I don't have any p-value for them so I put log10(1)=0 so on the plot I must see three bubbles on the 0 axis but I see only one bubble, please correct me if I am wrong here

I want to show gene cable for all if possible

enter image description here

ADD REPLYlink written 27 days ago by A3.9k
1

If their p-value and percentage are the same the points will be exactly on top of each other.

ADD REPLYlink written 27 days ago by rpolicastro2.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2146 users visited in the last hour