Question: R : Plotting several variables using ggplot geombar
gravatar for regue.hadrien
4 months ago by
regue.hadrien30 wrote:

Hi there,

I'm currently struggling with ggplot geombar.

To present quickly the data, we've computed the proportion of several virus in diferent kind of samples (column sample). 3 different protocols have been used (column proto) in triplets (column dupli).

I would like to plot the proportion of each virus in each assay, depending on protocol used, and separate results for each kind of samples

Here is an extract from my data:

sample proto dupli virus                     prop
1 HSV    E     S3    Mastadenovirus      0.00000770
2 HSV    E     S3    Orthopneumovirus    0         
3 HSV    E     S3    Simplexvirus        0.996     
4 HSV    E     S3    Alphainfluenzavirus 0         
5 VRS    E     S3    Enterovirus         0         
6 HSV    E     S3    Dependoparvovirus   0         
7 HSV    E     S3    Levivirus           0.0000847 
8 HSV    E     S3    others              0.00373   
9 HSV    E     S10   Mastadenovirus      0.0000136 
10 HSV    E     S10   Orthopneumovirus    0  
11 MOCK    E     S3    Levivirus           0.0000847

I had a start of anwsers on Stackoverflow for plotting an incomplete result like this:

ggplot(subtable2, aes(fill=virus,x=proto, y=prop))+ 
geom_bar(position="stack", stat="identity")+ 

current result

How can I improve my code to visualize the proportion results for each "dupli" members, and get something like this?

final plot

I tried to create a new variable with:


But i'm not sure this is the right way. Do you have some clues to do this, or should I rework my data and visualisation?

Thx, have a nice WE

ggplot2 R • 198 views
ADD COMMENTlink modified 4 months ago by aaragak140 • written 4 months ago by regue.hadrien30

Hello! I hope you're keeping well.

What happens when you use sampledupli as your faceting variable?

I'm trying to generate a toy dataset that I can work with to help you with this problem, but I'm having trouble figuring out what prop is referring to exactly. Am I correct in assuming that it is the proportion of a given virus for that particular sample, protocol, and duplicate?


ADD REPLYlink written 4 months ago by aaragak140

Thx for the anwser!

prop is the viral proportion in the triplicat. If you sum prop for each one:

    aggregate(subtable2$prop, list(subtable2$sampledupli), function(x) sum(x))
    Group.1 x
    1   HSV_S10 1
    2   HSV_S17 1
    3   HSV_S24 1
    4    HSV_S3 1
    5   HSV_S31 1
    6   HSV_S38 1
    7   HSV_S44 1
    8   HSV_S49 1
    9   HSV_S54 1
   10  MOCK_S1 1
   11 MOCK_S15 1
   12 MOCK_S22 1
   13 MOCK_S29 1
   14 MOCK_S36 1
   15 MOCK_S43 1
   16 MOCK_S48 1
   17 MOCK_S53 1
   18  MOCK_S8 1
   19  VRS_S12 1
   20  VRS_S19 1
   21  VRS_S26 1
   22  VRS_S33 1
   23  VRS_S40 1
   24  VRS_S45 1
   25   VRS_S5 1
   26  VRS_S50 1
   27  VRS_S55 1

plot with sample dupli

ADD REPLYlink written 4 months ago by regue.hadrien30
gravatar for aaragak1
4 months ago by
aaragak140 wrote:

I've tried something like this - let me know if this is what you were going for

reprex <-  
    tibble(sample = rep(c("HSV", "MOCK", "VRS"), each = 300), 
              proto = rep(c("E", "M", "Q"), each = 100, times = 3), 
              dupli = rep(c(1:3), times = 300), 
              virus = sample(LETTERS[1:7], 900, replace = T)) %>%
    group_by(sample, proto, dupli, virus) %>%
    mutate(num_virus = n())

ggplot(reprex, aes(fill = virus, y = num_virus, x = dupli)) + 
    geom_bar(position = "fill", stat = "identity") +
    facet_grid(cols = vars(sample, proto))


ADD COMMENTlink modified 4 months ago • written 4 months ago by aaragak140

Well, this is that kind of plot i'm trying to produce. Thanks! However i'm getting this: my results

I'm guessing that awefull output is caused by my variable dupli. Do you know how can I recode this variable to get 1/2/3 for each sample like you instead of SX, which the number doesn't matter (it was just an experimental trial number)

ADD REPLYlink written 4 months ago by regue.hadrien30

Off the top of my head, you might be able to use dplyrs mutate + case_when, or perhaps an as.factor? I'm not entirely sure what the current levels of the column is right now

ADD REPLYlink written 4 months ago by aaragak140

I'm not an expert with the tidy language, so I've just tried a dirty method: add a c(1,2,3......1,2,3) vector to my table, and thanks to your code, it worked perfectly! final plot

Thank you all for your awnsers!

ADD REPLYlink written 4 months ago by regue.hadrien30

I think all your samples do not have prop values equally in all groups. Try making scales independent in facet_grid option.

ADD REPLYlink modified 4 months ago • written 4 months ago by cpad011214k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 826 users visited in the last hour