Question: How to plot a multiple line graph with Mean and Std Error for following dataset?
1
gravatar for Wuschel
15 months ago by
Wuschel150
HUJI
Wuschel150 wrote:

I have a data set of >100 different samples. Samples are from different genotypes (e.g. X, Y, Z) and 4 different time points (T0,1,2,3) with 3 biological replicates (R1,2,3). I'm measuring values for 50 different genes (in raws)

structure(list(Gene = structure(1:2, .Label = c("A", "B"), class = "factor"), 
X_T0_R1 = c(1.46559502, 0.220140568), X_T0_R2 = c(1.087642983, 
0.237500819), X_T0_R3 = c(1.424945196, 0.21066267), X_T1_R1 = c(1.289943948, 
0.207778662), X_T1_R2 = c(1.376535013, 0.488774258), X_T1_R3 = c(1.833390311, 
0.182798731), X_T2_R1 = c(1.450753714, 0.247576125), X_T2_R2 = c(1.3094609, 
0.390028842), X_T2_R3 = c(0.5953716, 1.007079177), X_T3_R1 = c(0.7906009, 
0.730242116), X_T3_R2 = c(1.215333041, 1.012914813), X_T3_R3 = c(1.069312467, 
0.780421013), Y_T0_R1 = c(0.053317766, 3.316414959), Y_T0_R2 = c(0.506623748, 
3.599442788), Y_T0_R3 = c(0.713670106, 2.516735845), Y_T1_R1 = c(0.740998252, 
1.444496448), Y_T1_R2 = c(0.648231834, 0.097957459), Y_T1_R3 = c(0.780499252, 
0.187840968), Y_T2_R1 = c(0.35344654, 1.190274584), Y_T2_R2 = c(0.220223951, 
1.367784148), Y_T2_R3 = c(0.432856978, 1.403057729), Y_T3_R1 = c(0.234963735, 
1.232129062), Y_T3_R2 = c(0.353770497, 0.885122768), Y_T3_R3 = c(0.396091395, 
1.333921747), Z_T0_R1 = c(0.398000559, 1.286528398), Z_T0_R2 = c(0.384759325, 
1.122251177), Z_T0_R3 = c(1.582230097, 0.697419716), Z_T1_R1 = c(1.136843842, 
0.804552001), Z_T1_R2 = c(1.275683837, 1.227821594), Z_T1_R3 = c(0.963349308, 
0.968589683), Z_T2_R1 = c(3.765036263, 0.477443352), Z_T2_R2 = c(1.901023385, 
0.832736132), Z_T2_R3 = c(1.407713024, 0.911920317), Z_T3_R1 = c(0.988333629, 
1.095130142), Z_T3_R2 = c(0.618606729, 0.497458337), Z_T3_R3 = c(0.429823986, 
    0.471389536)), .Names = c("Gene", "X_T0_R1", "X_T0_R2", "X_T0_R3", 
"X_T1_R1", "X_T1_R2", "X_T1_R3", "X_T2_R1", "X_T2_R2", "X_T2_R3", 
"X_T3_R1", "X_T3_R2", "X_T3_R3", "Y_T0_R1", "Y_T0_R2", "Y_T0_R3", 
"Y_T1_R1", "Y_T1_R2", "Y_T1_R3", "Y_T2_R1", "Y_T2_R2", "Y_T2_R3", 
"Y_T3_R1", "Y_T3_R2", "Y_T3_R3", "Z_T0_R1", "Z_T0_R2", "Z_T0_R3", 
"Z_T1_R1", "Z_T1_R2", "Z_T1_R3", "Z_T2_R1", "Z_T2_R2", "Z_T2_R3", 
"Z_T3_R1", "Z_T3_R2", "Z_T3_R3"), class = "data.frame", row.names = c(NA, 
-2L))

For each gene (i.e. for each column), I want to plot a graph with an average of replicates of each genotype + SE Expected line graph pattern plot with SE

E.g. 1E.g. 2

i.e. for an e.g. For Gene A, at a particular time point (0/1/3/5) I want to draw a graph with all the genotypes (X, Y, Z); should be the3 lines in the plot looking like above plots.

How is this possible using R? How can I include Std Error? Using loops can I generate 50 graphs (for each raw separate graph)?

R • 2.4k views
ADD COMMENTlink modified 15 months ago by genomax70k • written 15 months ago by Wuschel150
1

Values you furnished above large deviations. See if following plot works. Data is taken from OP:

df=read.csv("df1.txt", sep="\t",stringsAsFactors = F)
library(tidyr)
df1=gather(df,"TP","Values",-Gene)
library(stringr)
df2=cbind(df1,str_split_fixed(df1$TP,"_",3))
colnames(df2)[4:6]=c("genotype","time","replicate")
library(Rmisc)
df4=summarySE(df2, measurevar="Values", groupvars=c("time","Gene","genotype"))

  ggplot(df4, aes(time, Values, group = genotype, color = genotype)) +
  geom_line() +
  geom_point() +
  facet_wrap( ~ Gene) +
  labs(title = "Gene expression over 16 hr", x = "Time (hr)", y = "Measurement") +
  theme_linedraw() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 20),
    strip.text = element_text(size = 20),
    axis.title.y = element_text(size = 20),
    axis.title.x = element_text(size = 20),
    axis.text.x = element_text(size = 14),
    axis.text.y = element_text(size = 14)
  ) +
  geom_ribbon(aes(ymax = Values + sd, ymin = Values - sd),
              alpha = 0.5,
              fill = "grey70",
              colour=NA
              )

Rplot01

ADD REPLYlink modified 15 months ago • written 15 months ago by cpad011211k

Thank you cpad0112

I'm working with your codes. Im having a error after df4=summarySE(df2, measurevar="Values", groupvars=c("time","Gene","genotype"))

Error in summarySE(df2, measurevar = "Values", groupvars = c("time", "Gene", : could not find function "summarySE"

Could you please help me with this

ADD REPLYlink modified 15 months ago • written 15 months ago by Wuschel150

sorry..forgot to add following line: library(Rmisc). SummarySE function is from Rmisc library. Load Rmisc library. Updated the code.

ADD REPLYlink written 15 months ago by cpad011211k

Thanks, cpad0112 , Sorry to bother I get another error message

Error in combine_vars(data, params$plot_env, vars, drop = params$drop) : At least one layer must contain all variables used for facetting

Would appreciate your help :)

ADD REPLYlink written 15 months ago by Wuschel150

well, if you could post script here (ggplot) it would help. Check if you have recent versions of ggplot.

ADD REPLYlink written 15 months ago by cpad011211k

I see, I do not have ggplot. Is this different form ggplot2.

Where can I get this package? Googling doesn't help :(

df <- read.csv("SI_AVG_Line.csv")

library(tidyr) df1 <- gather(df,"Transitions","Values",-Targets) library(stringr) df2 <- cbind(df1,str_split_fixed(df1$Transitions,"_",3)) colnames(df2)[4:6]=c("genotype","time","replicate") library(Rmisc) df4 <- summarySE(df2, measurevar="Values", groupvars=c("time","Targets","genotype")) ggplot(df4, aes(time, Values, group = genotype, color = genotype)) + geom_line() + geom_point() + facet_wrap( ~ Gene) + labs(title = "Gene Expression vs time", x = "Time (d)", y = "Area_counts") + theme_linedraw() + theme( plot.title = element_text(hjust = 0.5, size = 20), strip.text = element_text(size = 20), axis.title.y = element_text(size = 20), axis.title.x = element_text(size = 20), axis.text.x = element_text(size = 14), axis.text.y = element_text(size = 14) ) + geom_ribbon(aes(ymax = Values + sd, ymin = Values - sd), alpha = 0.5, fill = "grey70", colour=NA )

@ cpad0112 Given below is the working .csv file where error comes , if in a case this is the problem!

ADD REPLYlink modified 15 months ago • written 15 months ago by Wuschel150
structure(list(Targets = c("A", "B", "C", "nor"), X_S1_0d_1 = c(1.940487232,

1.079594087, 1.459871602, 1), X_S1_0d_2 = c(1.940630815, 0.790986517, 0.836386383, 1), X_S1_0d_3 = c(2.05030161, 0.721115111, 0.802144144, 1), X_S1_1d_1 = c(0.927368618, 1.186737277, 0.765095737, 1), X_S1_1d_2 = c(1.159347963, 1.427045976, 1.196499915, 1), X_S1_1d_3 = c(1.009271935, 1.049367585, 0.748728559, 1), X_S1_3d_1 = c(0.794781558, 1.072762904, 1.288591327, 1), X_S1_3d_2 = c(0.698642658, 0.971534921, 0.923846091, 1), X_S1_3d_3 = c(0.938922191, 0.80228642, 1.433899521, 1), X_S1_5d_1 = c(0.768844884, 1.458863535, 0.880239008, 1), X_S1_5d_2 = c(0.586314866, 1.027767798, 0.831469797, 1), X_S1_5d_3 = c(0.604124099, 1.502330028, 1.101895903, 1), mut2_S1_0d_1 = c(2.085432338, 0.861943427, 0.509210189, 1), mut2_S1_0d_2 = c(1.774970153, 1.074569974, 3.128664718, 1), mut2_S1_0d_3 = c(2.003870102, 0.753483213, 1.047020362, 1), mut2_S1_1d_1 = c(1.168381858, 1.15001272, 0.580462548, 1), mut2_S1_1d_2 = c(1.33284456, 0.450460567, 0.959430252, 1), mut2_S1_1d_3 = c(1.106332747, 0.466636391, 0.660254618, 1), mut2_S1_3d_1 = c(0.859543853, 1.188445442, 1.044546139, 1), mut2_S1_3d_2 = c(1.022929555, 1.259366417, 1.776709656, 1), mut2_S1_3d_3 = c(0.917527143, 2.137370791, 0.669765284, 1), mut2_S1_5d_1 = c(0.642810843, 0.496709803, 0.801885112, 1), mut2_S1_5d_2 = c(0.879777521, 1.170165217, 1.793443182, 1), mut2_S1_5d_3 = c(0.816650769, 0.864352103, 0.768312731, 1), mut5_S1_0d_1 = c(1.936291138, 0.721197246, 1.885982652, 1), mut5_S1_0d_2 = c(2.136240851, 0.925363277, 0.282462799, 1), mut5_S1_0d_3 = c(1.986120429, 0.677085837, 0.124936834, 1), mut5_S1_1d_1 = c(1.346339786, 0.989266319, 1.396700558, 1), mut5_S1_1d_2 = c(1.489199506, 1.269083963, 1.48921516, 1), mut5_S1_1d_3 = c(1.584229502, 0.88246637, 2.25267634, 1), mut5_S1_3d_1 = c(0.755948531, 1.451613602, 0.898362008, 1), mut5_S1_3d_2 = c(0.824308907, 0.5962476, 0.523055204, 1), mut5_S1_3d_3 = c(0.753359409, 0.753222103, 0.948441646, 1), mut5_S1_5d_1 = c(0.788525215, 1.85338769, 0.951693842, 1), mut5_S1_5d_2 = c(1.010417043, 1.983625345, 1.086768544, 1), mut5_S1_5d_3 = c(0.630454563, 1.439599004, 1.416591771, 1), mut7_S1_0d_1 = c(1.672072567, 0.611243763, 0.705364938, 1), mut7_S1_0d_2 = c(1.738837658, 0.503828595, 0.499147343, 1), mut7_S1_0d_3 = c(2.149037252, 1.192787265, 1.226895377, 1), mut7_S1_1d_1 = c(1.421761015, 1.084490092, 0.497815065, 1), mut7_S1_1d_2 = c(1.068782794, 0.584950798, 0.38078948, 1), mut7_S1_1d_3 = c(1.229045044, 0.822348277, 0.449995849, 1), mut7_S1_3d_1 = c(0.890386073, 0.802513638, 0.757190729, 1), mut7_S1_3d_2 = c(1.022619118, 0.806565748, 0.645204575, 1), mut7_S1_3d_3 = c(0.80348663, 0.753993198, 0.593817113, 1), mut7_S1_5d_1 = c(0.780575903, 0.724989068, 0.92248483, 1), mut7_S1_5d_2 = c(0.743592574, 1.279872561, 1.201887432, 1), mut7_S1_5d_3 = c(0.522476113, 0.751493063, 0.899865367, 1), mut9_S1_0d_1 = c(1.247510942, 0.762934403, 2.009134613, 1), mut9_S1_0d_2 = c(1.159843529, 0.684622155, 0.499925077, 1), mut9_S1_0d_3 = c(1.247510942, 0.762934403, 2.205521099, 1), mut9_S1_1d_1 = c(1.139288266, 0.530593446, 0.767442607, 1), mut9_S1_1d_2 = c(1.257958733, 0.780701299, 0.77153391, 1), mut9_S1_1d_3 = c(1.230762109, 0.536139676, 0.742313942, 1), mut9_S1_3d_1 = c(0.809093089, 0.59528538, 0.804481151, 1), mut9_S1_3d_2 = c(0.853017549, 0.826757331, 1.141960538, 1), mut9_S1_3d_3 = c(0.813029821, 0.748971384, 1.964723247, 1), mut9_S1_5d_1 = c(0.797277294, 1.327830526, 0.943500196, 1), mut9_S1_5d_2 = c(0.669946954, 1.011869145, 0.979867227, 1), mut9_S1_5d_3 = c(0.525670301, 1.067407334, 0.76001394, 1)), .Names = c("Targets", "X_S1_0d_1", "X_S1_0d_2", "X_S1_0d_3", "X_S1_1d_1", "X_S1_1d_2", "X_S1_1d_3", "X_S1_3d_1", "X_S1_3d_2", "X_S1_3d_3", "X_S1_5d_1", "X_S1_5d_2", "X_S1_5d_3", "mut2_S1_0d_1", "mut2_S1_0d_2", "mut2_S1_0d_3", "mut2_S1_1d_1", "mut2_S1_1d_2", "mut2_S1_1d_3", "mut2_S1_3d_1", "mut2_S1_3d_2", "mut2_S1_3d_3", "mut2_S1_5d_1", "mut2_S1_5d_2", "mut2_S1_5d_3", "mut5_S1_0d_1", "mut5_S1_0d_2", "mut5_S1_0d_3", "mut5_S1_1d_1", "mut5_S1_1d_2", "mut5_S1_1d_3", "mut5_S1_3d_1", "mut5_S1_3d_2", "mut5_S1_3d_3", "mut5_S1_5d_1", "mut5_S1_5d_2", "mut5_S1_5d_3", "mut7_S1_0d_1", "mut7_S1_0d_2", "mut7_S1_0d_3", "mut7_S1_1d_1", "mut7_S1_1d_2", "mut7_S1_1d_3", "mut7_S1_3d_1", "mut7_S1_3d_2", "mut7_S1_3d_3", "mut7_S1_5d_1", "mut7_S1_5d_2", "mut7_S1_5d_3", "mut9_S1_0d_1", "mut9_S1_0d_2", "mut9_S1_0d_3", "mut9_S1_1d_1", "mut9_S1_1d_2", "mut9_S1_1d_3", "mut9_S1_3d_1", "mut9_S1_3d_2", "mut9_S1_3d_3", "mut9_S1_5d_1", "mut9_S1_5d_2", "mut9_S1_5d_3"), class = "data.frame", row.names = c(NA, -4L))

ADD REPLYlink written 15 months ago by Wuschel150
1

Issue with the code is here: df2 <- cbind(df1,str_split_fixed(df1$Transitions,"_",3)). When you split the string (example: mut9_S1_3d_2) with _, you would get 4 strings and you are selecting 3. I meant ggplot2 not ggplot.

code:

library(tidyr)
df1=gather(test,"TP","Values",-Targets)

library(stringr)
df2=cbind(df1,str_split_fixed(df1$TP,"_",4))
colnames(df2)[4:7]=c("a","b","time","replicate")
df2$mut = paste(df2$a, df2$b, sep="_")

library(dplyr)
df3=select(df2, -c(a,b))

library(Rmisc)
names(df3)
df4=summarySE(df3, measurevar="Values", groupvars=c("time","mut","Targets"))
View(df4)
View(df)

library(ggplot2)
ggplot(df4, aes(time, Values, group = mut, color = mut)) +
  geom_line() +
  geom_point() +
  facet_wrap( ~ Targets) +
  labs(title = "Gene expression ", x = "Time (hr)", y = "Measurement") +
  theme_linedraw() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 20),
    strip.text = element_text(size = 20),
    axis.title.y = element_text(size = 20),
    axis.title.x = element_text(size = 20),
    axis.text.x = element_text(size = 14),
    axis.text.y = element_text(size = 14)
  ) +
  geom_ribbon(aes(ymax = Values + sd, ymin = Values - sd),
              alpha = 0.3,
              fill = "grey70",
              colour=NA
  )

Rplot01

with error bar only (mean +/- SD):

Rplot02_errorbar

For error bars, replace following code in the code with one that is highlighted in red below:

geom_ribbon(aes(ymax = Values + sd, ymin = Values - sd),
              alpha = 0.3,
              fill = "grey70",
              colour=NA
  )

with geom_pointrange(aes(ymax=Values+sd, ymin=Values-sd))

ADD REPLYlink modified 15 months ago • written 15 months ago by cpad011211k

Thank you cpad0112 Thank you for helping me this much even though you do not know me personally. I will never forget your help. When the right time comes I promise I'll acknowledge all the people help me without knowing me. I wish if I know who you are (being anonymous is the best for me like dummy only). I appreciate your time and kindness!!! Wishing you the best!!!

ADD REPLYlink written 15 months ago by Wuschel150

No problem and thank you. Which ever post helped you in resolving issue, mark it as answer. Good luck with your research.

Note: When one gets a help/suggestion from Biostars, it the help/suggestion from forum (Biostars) and vice versa is also true. When one asks a questions/suggestion/issue, it is not individual one, It is a subject/knowledge related issue. If you like/wish to acknowledge any help/suggestion from this forum, you can cite/acknowledge forum in technical writings (thesis/manuscript/presentation/abstract etc). Please contact admins for instructions on how to cite biostars forum. Please note that acknowledgement/citation mentioned above is mere a suggestion, not mandatory.

ADD REPLYlink modified 15 months ago • written 15 months ago by cpad011211k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1069 users visited in the last hour