Question: How to use tidyr package to average replicate data for different time points and plot each gene separately?
1
gravatar for Wox
2.6 years ago by
Wox370
HUJI
Wox370 wrote:

I have a data set of >100 different samples. Samples are from different genotypes (e.g. X, Y, Z) and 4 different time points (T0,1,2,3) with 3 biological replicates (R1,2,3). I'm measuring values for 50 different genes (in raws). Data Set look like this

For each gene (i.e. each column), I want to plot a graph with an average of replicates of each genotype + SE expected final graph I would like to do this by creating a new data frame; containing for each set of replicates the mean and Std Error. How is this possible using tidyr package? How can I include Std Error? How can I improve this coding?

data.mean<- data.frame(matrix(nrows=50)) for(col in seq(1,length(colnames(data)), by=3)) {data.mean <-cbind(data.mean,apply(subset(data, select=seq(col,length.out = 3)),1,mean, na.rm = TRUE)) colnames(data.mean)[ncol(data.mean)] <- colnames(data)[col]}

 

structure(list(Gene = structure(1:2, .Label = c("A", "B"), class = "factor"), 
    X_T0_R1 = c(1.46559502, 0.220140568), X_T0_R2 = c(1.087642983, 
    0.237500819), X_T0_R3 = c(1.424945196, 0.21066267), X_T1_R1 = c(1.289943948, 
    0.207778662), X_T1_R2 = c(1.376535013, 0.488774258), X_T1_R3 = c(1.833390311, 
    0.182798731), X_T2_R1 = c(1.450753714, 0.247576125), X_T2_R2 = c(1.3094609, 
    0.390028842), X_T2_R3 = c(0.5953716, 1.007079177), X_T3_R1 = c(0.7906009, 
    0.730242116), X_T3_R2 = c(1.215333041, 1.012914813), X_T3_R3 = c(1.069312467, 
    0.780421013), Y_T0_R1 = c(0.053317766, 3.316414959), Y_T0_R2 = c(0.506623748, 
    3.599442788), Y_T0_R3 = c(0.713670106, 2.516735845), Y_T1_R1 = c(0.740998252, 
    1.444496448), Y_T1_R2 = c(0.648231834, 0.097957459), Y_T1_R3 = c(0.780499252, 
    0.187840968), Y_T2_R1 = c(0.35344654, 1.190274584), Y_T2_R2 = c(0.220223951, 
    1.367784148), Y_T2_R3 = c(0.432856978, 1.403057729), Y_T3_R1 = c(0.234963735, 
    1.232129062), Y_T3_R2 = c(0.353770497, 0.885122768), Y_T3_R3 = c(0.396091395, 
    1.333921747), Z_T0_R1 = c(0.398000559, 1.286528398), Z_T0_R2 = c(0.384759325, 
    1.122251177), Z_T0_R3 = c(1.582230097, 0.697419716), Z_T1_R1 = c(1.136843842, 
    0.804552001), Z_T1_R2 = c(1.275683837, 1.227821594), Z_T1_R3 = c(0.963349308, 
    0.968589683), Z_T2_R1 = c(3.765036263, 0.477443352), Z_T2_R2 = c(1.901023385, 
    0.832736132), Z_T2_R3 = c(1.407713024, 0.911920317), Z_T3_R1 = c(0.988333629, 
    1.095130142), Z_T3_R2 = c(0.618606729, 0.497458337), Z_T3_R3 = c(0.429823986, 
    0.471389536)), .Names = c("Gene", "X_T0_R1", "X_T0_R2", "X_T0_R3", 
"X_T1_R1", "X_T1_R2", "X_T1_R3", "X_T2_R1", "X_T2_R2", "X_T2_R3", 
"X_T3_R1", "X_T3_R2", "X_T3_R3", "Y_T0_R1", "Y_T0_R2", "Y_T0_R3", 
"Y_T1_R1", "Y_T1_R2", "Y_T1_R3", "Y_T2_R1", "Y_T2_R2", "Y_T2_R3", 
"Y_T3_R1", "Y_T3_R2", "Y_T3_R3", "Z_T0_R1", "Z_T0_R2", "Z_T0_R3", 
"Z_T1_R1", "Z_T1_R2", "Z_T1_R3", "Z_T2_R1", "Z_T2_R2", "Z_T2_R3", 
"Z_T3_R1", "Z_T3_R2", "Z_T3_R3"), class = "data.frame", row.names = c(NA, 
-2L))
R gene • 1.5k views
ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by Wox370
1
library(tidyr)
df1=gather(df,"TP","Values",-Gene)

library(stringr)
df2=cbind(df1,str_split_fixed(df1$TP,"_",3))
colnames(df2)[4:6]=c("genotype","time","replicate")
df3=df2[df2$Gene=="A",]

library(Rmisc)
sum_stats <- summarySE(df3, measurevar="Values", groupvars=c("genotype","time"))
colnames(sum_stats)

library(ggplot2)
ggplot(sum_stats, aes(genotype, Values, fill = genotype)) +
  geom_bar(stat = "identity") +
  facet_wrap(~ time,
             ncol = 4,
             nrow = 1,
             strip.position = "bottom") +
  labs(title = "GeneA", x = "Time (hr)", y = "Measurement") +
  theme_linedraw() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 20),
    strip.text = element_text(size = 20),
    axis.title.y = element_text(size = 20),
    axis.title.x = element_text(size = 20),
    legend.position = "none",
#    axis.text.x = element_blank(),
    axis.ticks.x = element_blank(),
    axis.text.y = element_text(size = 14)) + 
  geom_errorbar(aes(ymax = Values + sd, ymin = Values - sd))

Rplot

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by cpad011214k

could you provide us with a minimal reproducible dataset using dput function in R (just copy paste the results of dput(df) here ; df is your dataframe)

ADD REPLYlink written 2.6 years ago by Nicolas Rosewick9.2k

Hi Nicolas, if you don't mind, I've emailed sample data file for you. You can use it public.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by Wox370

BIOAWY : Please post a sample of the data here. We encourage all communication to stay on the forum. I suggest that you edit the original post and add the data there.

ADD REPLYlink written 2.6 years ago by genomax92k

@genomax My apologies, I'm not aware how to enter this data here! Can I attach CSV file.

ADD REPLYlink written 2.6 years ago by Wox370

You don't need to attach the full file. Can you use the command @Nicolas gave (dput(df)) and post the results here?

ADD REPLYlink written 2.6 years ago by genomax92k

I'm sorry I could not do this :( error comes, I'm happy to give sample data file, if it possible attached

ADD REPLYlink written 2.6 years ago by Wox370
1

Oh well. Hopefully you emailed the file to right @Nicolas. He can post the data excerpt when he posts a solution.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by genomax92k

I've tried, hope that will help

ADD REPLYlink written 2.6 years ago by Wox370
2
gravatar for russhh
2.6 years ago by
russhh5.5k
UK, U. Glasgow
russhh5.5k wrote:
df_long <- df %>%
gather("expt", "measurement", -1) %>%
mutate(
    genotype = substring(expt, 1, 1),
    time = substring(expt, 3, 4)
) %>%
group_by(genotype, time) %>%
summarise(
    mean = mean(measurement),
    se = sd(measurement) / sqrt(n())
)

df_long %>%
    ggplot(aes(x = time, y = mean)) +
    geom_bar(stat = "identity") +
    geom_errorbar(aes(ymin = mean - se, ymax = mean + se)) +
    facet_wrap(~ genotype)
ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by russhh5.5k
1

... though I think you should be presenting mean +/- SD

ADD REPLYlink written 2.6 years ago by russhh5.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1477 users visited in the last hour