Question: How to use tidyr package to average replicate data for different time points and plot each gene separately?
1
gravatar for Wuschel
16 months ago by
Wuschel170
HUJI
Wuschel170 wrote:

I have a data set of >100 different samples. Samples are from different genotypes (e.g. X, Y, Z) and 4 different time points (T0,1,2,3) with 3 biological replicates (R1,2,3). I'm measuring values for 50 different genes (in raws). Data Set look like this

For each gene (i.e. each column), I want to plot a graph with an average of replicates of each genotype + SE expected final graph I would like to do this by creating a new data frame; containing for each set of replicates the mean and Std Error. How is this possible using tidyr package? How can I include Std Error? How can I improve this coding?

data.mean<- data.frame(matrix(nrows=50)) for(col in seq(1,length(colnames(data)), by=3)) {data.mean <-cbind(data.mean,apply(subset(data, select=seq(col,length.out = 3)),1,mean, na.rm = TRUE)) colnames(data.mean)[ncol(data.mean)] <- colnames(data)[col]}

 

structure(list(Gene = structure(1:2, .Label = c("A", "B"), class = "factor"), 
    X_T0_R1 = c(1.46559502, 0.220140568), X_T0_R2 = c(1.087642983, 
    0.237500819), X_T0_R3 = c(1.424945196, 0.21066267), X_T1_R1 = c(1.289943948, 
    0.207778662), X_T1_R2 = c(1.376535013, 0.488774258), X_T1_R3 = c(1.833390311, 
    0.182798731), X_T2_R1 = c(1.450753714, 0.247576125), X_T2_R2 = c(1.3094609, 
    0.390028842), X_T2_R3 = c(0.5953716, 1.007079177), X_T3_R1 = c(0.7906009, 
    0.730242116), X_T3_R2 = c(1.215333041, 1.012914813), X_T3_R3 = c(1.069312467, 
    0.780421013), Y_T0_R1 = c(0.053317766, 3.316414959), Y_T0_R2 = c(0.506623748, 
    3.599442788), Y_T0_R3 = c(0.713670106, 2.516735845), Y_T1_R1 = c(0.740998252, 
    1.444496448), Y_T1_R2 = c(0.648231834, 0.097957459), Y_T1_R3 = c(0.780499252, 
    0.187840968), Y_T2_R1 = c(0.35344654, 1.190274584), Y_T2_R2 = c(0.220223951, 
    1.367784148), Y_T2_R3 = c(0.432856978, 1.403057729), Y_T3_R1 = c(0.234963735, 
    1.232129062), Y_T3_R2 = c(0.353770497, 0.885122768), Y_T3_R3 = c(0.396091395, 
    1.333921747), Z_T0_R1 = c(0.398000559, 1.286528398), Z_T0_R2 = c(0.384759325, 
    1.122251177), Z_T0_R3 = c(1.582230097, 0.697419716), Z_T1_R1 = c(1.136843842, 
    0.804552001), Z_T1_R2 = c(1.275683837, 1.227821594), Z_T1_R3 = c(0.963349308, 
    0.968589683), Z_T2_R1 = c(3.765036263, 0.477443352), Z_T2_R2 = c(1.901023385, 
    0.832736132), Z_T2_R3 = c(1.407713024, 0.911920317), Z_T3_R1 = c(0.988333629, 
    1.095130142), Z_T3_R2 = c(0.618606729, 0.497458337), Z_T3_R3 = c(0.429823986, 
    0.471389536)), .Names = c("Gene", "X_T0_R1", "X_T0_R2", "X_T0_R3", 
"X_T1_R1", "X_T1_R2", "X_T1_R3", "X_T2_R1", "X_T2_R2", "X_T2_R3", 
"X_T3_R1", "X_T3_R2", "X_T3_R3", "Y_T0_R1", "Y_T0_R2", "Y_T0_R3", 
"Y_T1_R1", "Y_T1_R2", "Y_T1_R3", "Y_T2_R1", "Y_T2_R2", "Y_T2_R3", 
"Y_T3_R1", "Y_T3_R2", "Y_T3_R3", "Z_T0_R1", "Z_T0_R2", "Z_T0_R3", 
"Z_T1_R1", "Z_T1_R2", "Z_T1_R3", "Z_T2_R1", "Z_T2_R2", "Z_T2_R3", 
"Z_T3_R1", "Z_T3_R2", "Z_T3_R3"), class = "data.frame", row.names = c(NA, 
-2L))
R gene • 786 views
ADD COMMENTlink modified 16 months ago • written 16 months ago by Wuschel170
1
library(tidyr)
df1=gather(df,"TP","Values",-Gene)

library(stringr)
df2=cbind(df1,str_split_fixed(df1$TP,"_",3))
colnames(df2)[4:6]=c("genotype","time","replicate")
df3=df2[df2$Gene=="A",]

library(Rmisc)
sum_stats <- summarySE(df3, measurevar="Values", groupvars=c("genotype","time"))
colnames(sum_stats)

library(ggplot2)
ggplot(sum_stats, aes(genotype, Values, fill = genotype)) +
  geom_bar(stat = "identity") +
  facet_wrap(~ time,
             ncol = 4,
             nrow = 1,
             strip.position = "bottom") +
  labs(title = "GeneA", x = "Time (hr)", y = "Measurement") +
  theme_linedraw() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 20),
    strip.text = element_text(size = 20),
    axis.title.y = element_text(size = 20),
    axis.title.x = element_text(size = 20),
    legend.position = "none",
#    axis.text.x = element_blank(),
    axis.ticks.x = element_blank(),
    axis.text.y = element_text(size = 14)) + 
  geom_errorbar(aes(ymax = Values + sd, ymin = Values - sd))

Rplot

ADD REPLYlink modified 16 months ago • written 16 months ago by cpad011211k

could you provide us with a minimal reproducible dataset using dput function in R (just copy paste the results of dput(df) here ; df is your dataframe)

ADD REPLYlink written 16 months ago by Nicolas Rosewick8.1k

Hi Nicolas, if you don't mind, I've emailed sample data file for you. You can use it public.

ADD REPLYlink modified 16 months ago • written 16 months ago by Wuschel170

BIOAWY : Please post a sample of the data here. We encourage all communication to stay on the forum. I suggest that you edit the original post and add the data there.

ADD REPLYlink written 16 months ago by genomax71k

@genomax My apologies, I'm not aware how to enter this data here! Can I attach CSV file.

ADD REPLYlink written 16 months ago by Wuschel170

You don't need to attach the full file. Can you use the command @Nicolas gave (dput(df)) and post the results here?

ADD REPLYlink written 16 months ago by genomax71k

I'm sorry I could not do this :( error comes, I'm happy to give sample data file, if it possible attached

ADD REPLYlink written 16 months ago by Wuschel170
1

Oh well. Hopefully you emailed the file to right @Nicolas. He can post the data excerpt when he posts a solution.

ADD REPLYlink modified 16 months ago • written 16 months ago by genomax71k

I've tried, hope that will help

ADD REPLYlink written 16 months ago by Wuschel170
2
gravatar for russhh
16 months ago by
russhh4.7k
UK, U. Glasgow
russhh4.7k wrote:
df_long <- df %>%
gather("expt", "measurement", -1) %>%
mutate(
    genotype = substring(expt, 1, 1),
    time = substring(expt, 3, 4)
) %>%
group_by(genotype, time) %>%
summarise(
    mean = mean(measurement),
    se = sd(measurement) / sqrt(n())
)

df_long %>%
    ggplot(aes(x = time, y = mean)) +
    geom_bar(stat = "identity") +
    geom_errorbar(aes(ymin = mean - se, ymax = mean + se)) +
    facet_wrap(~ genotype)
ADD COMMENTlink modified 16 months ago • written 16 months ago by russhh4.7k
1

... though I think you should be presenting mean +/- SD

ADD REPLYlink written 16 months ago by russhh4.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 953 users visited in the last hour