Question: How to plot a multiple line graph with Mean and Std Error for following dataset?
1
2.7 years ago by
Wox420
HUJI
Wox420 wrote:

I have a data set of >100 different samples. Samples are from different genotypes (e.g. X, Y, Z) and 4 different time points (T0,1,2,3) with 3 biological replicates (R1,2,3). I'm measuring values for 50 different genes (in raws)

``````structure(list(Gene = structure(1:2, .Label = c("A", "B"), class = "factor"),
X_T0_R1 = c(1.46559502, 0.220140568), X_T0_R2 = c(1.087642983,
0.237500819), X_T0_R3 = c(1.424945196, 0.21066267), X_T1_R1 = c(1.289943948,
0.207778662), X_T1_R2 = c(1.376535013, 0.488774258), X_T1_R3 = c(1.833390311,
0.182798731), X_T2_R1 = c(1.450753714, 0.247576125), X_T2_R2 = c(1.3094609,
0.390028842), X_T2_R3 = c(0.5953716, 1.007079177), X_T3_R1 = c(0.7906009,
0.730242116), X_T3_R2 = c(1.215333041, 1.012914813), X_T3_R3 = c(1.069312467,
0.780421013), Y_T0_R1 = c(0.053317766, 3.316414959), Y_T0_R2 = c(0.506623748,
3.599442788), Y_T0_R3 = c(0.713670106, 2.516735845), Y_T1_R1 = c(0.740998252,
1.444496448), Y_T1_R2 = c(0.648231834, 0.097957459), Y_T1_R3 = c(0.780499252,
0.187840968), Y_T2_R1 = c(0.35344654, 1.190274584), Y_T2_R2 = c(0.220223951,
1.367784148), Y_T2_R3 = c(0.432856978, 1.403057729), Y_T3_R1 = c(0.234963735,
1.232129062), Y_T3_R2 = c(0.353770497, 0.885122768), Y_T3_R3 = c(0.396091395,
1.333921747), Z_T0_R1 = c(0.398000559, 1.286528398), Z_T0_R2 = c(0.384759325,
1.122251177), Z_T0_R3 = c(1.582230097, 0.697419716), Z_T1_R1 = c(1.136843842,
0.804552001), Z_T1_R2 = c(1.275683837, 1.227821594), Z_T1_R3 = c(0.963349308,
0.968589683), Z_T2_R1 = c(3.765036263, 0.477443352), Z_T2_R2 = c(1.901023385,
0.832736132), Z_T2_R3 = c(1.407713024, 0.911920317), Z_T3_R1 = c(0.988333629,
1.095130142), Z_T3_R2 = c(0.618606729, 0.497458337), Z_T3_R3 = c(0.429823986,
0.471389536)), .Names = c("Gene", "X_T0_R1", "X_T0_R2", "X_T0_R3",
"X_T1_R1", "X_T1_R2", "X_T1_R3", "X_T2_R1", "X_T2_R2", "X_T2_R3",
"X_T3_R1", "X_T3_R2", "X_T3_R3", "Y_T0_R1", "Y_T0_R2", "Y_T0_R3",
"Y_T1_R1", "Y_T1_R2", "Y_T1_R3", "Y_T2_R1", "Y_T2_R2", "Y_T2_R3",
"Y_T3_R1", "Y_T3_R2", "Y_T3_R3", "Z_T0_R1", "Z_T0_R2", "Z_T0_R3",
"Z_T1_R1", "Z_T1_R2", "Z_T1_R3", "Z_T2_R1", "Z_T2_R2", "Z_T2_R3",
"Z_T3_R1", "Z_T3_R2", "Z_T3_R3"), class = "data.frame", row.names = c(NA,
-2L))
``````

For each gene (i.e. for each column), I want to plot a graph with an average of replicates of each genotype + SE Expected line graph pattern plot with SE

i.e. for an e.g. For Gene A, at a particular time point (0/1/3/5) I want to draw a graph with all the genotypes (X, Y, Z); should be the3 lines in the plot looking like above plots.

How is this possible using R? How can I include Std Error? Using loops can I generate 50 graphs (for each raw separate graph)?

R • 9.7k views
modified 2.7 years ago by GenoMax94k • written 2.7 years ago by Wox420
1

Values you furnished above large deviations. See if following plot works. Data is taken from OP:

``````df=read.csv("df1.txt", sep="\t",stringsAsFactors = F)
library(tidyr)
df1=gather(df,"TP","Values",-Gene)
library(stringr)
df2=cbind(df1,str_split_fixed(df1\$TP,"_",3))
colnames(df2)[4:6]=c("genotype","time","replicate")
library(Rmisc)
df4=summarySE(df2, measurevar="Values", groupvars=c("time","Gene","genotype"))

ggplot(df4, aes(time, Values, group = genotype, color = genotype)) +
geom_line() +
geom_point() +
facet_wrap( ~ Gene) +
labs(title = "Gene expression over 16 hr", x = "Time (hr)", y = "Measurement") +
theme_linedraw() +
theme(
plot.title = element_text(hjust = 0.5, size = 20),
strip.text = element_text(size = 20),
axis.title.y = element_text(size = 20),
axis.title.x = element_text(size = 20),
axis.text.x = element_text(size = 14),
axis.text.y = element_text(size = 14)
) +
geom_ribbon(aes(ymax = Values + sd, ymin = Values - sd),
alpha = 0.5,
fill = "grey70",
colour=NA
)
``````

I'm working with your codes. Im having a error after df4=summarySE(df2, measurevar="Values", groupvars=c("time","Gene","genotype"))

Error in summarySE(df2, measurevar = "Values", groupvars = c("time", "Gene", : could not find function "summarySE"

sorry..forgot to add following line: `library(Rmisc)`. SummarySE function is from Rmisc library. Load Rmisc library. Updated the code.

Thanks, cpad0112 , Sorry to bother I get another error message

Error in combine_vars(data, params\$plot_env, vars, drop = params\$drop) : At least one layer must contain all variables used for facetting

well, if you could post script here (ggplot) it would help. Check if you have recent versions of ggplot.

I see, I do not have ggplot. Is this different form ggplot2.

Where can I get this package? Googling doesn't help :(

``````df <- read.csv("SI_AVG_Line.csv")
``````

library(tidyr) df1 <- gather(df,"Transitions","Values",-Targets) library(stringr) df2 <- cbind(df1,str_split_fixed(df1\$Transitions,"_",3)) colnames(df2)[4:6]=c("genotype","time","replicate") library(Rmisc) df4 <- summarySE(df2, measurevar="Values", groupvars=c("time","Targets","genotype")) ggplot(df4, aes(time, Values, group = genotype, color = genotype)) + geom_line() + geom_point() + facet_wrap( ~ Gene) + labs(title = "Gene Expression vs time", x = "Time (d)", y = "Area_counts") + theme_linedraw() + theme( plot.title = element_text(hjust = 0.5, size = 20), strip.text = element_text(size = 20), axis.title.y = element_text(size = 20), axis.title.x = element_text(size = 20), axis.text.x = element_text(size = 14), axis.text.y = element_text(size = 14) ) + geom_ribbon(aes(ymax = Values + sd, ymin = Values - sd), alpha = 0.5, fill = "grey70", colour=NA )

@ cpad0112 Given below is the working .csv file where error comes , if in a case this is the problem!

``````structure(list(Targets = c("A", "B", "C", "nor"), X_S1_0d_1 = c(1.940487232,
``````

1.079594087, 1.459871602, 1), X_S1_0d_2 = c(1.940630815, 0.790986517, 0.836386383, 1), X_S1_0d_3 = c(2.05030161, 0.721115111, 0.802144144, 1), X_S1_1d_1 = c(0.927368618, 1.186737277, 0.765095737, 1), X_S1_1d_2 = c(1.159347963, 1.427045976, 1.196499915, 1), X_S1_1d_3 = c(1.009271935, 1.049367585, 0.748728559, 1), X_S1_3d_1 = c(0.794781558, 1.072762904, 1.288591327, 1), X_S1_3d_2 = c(0.698642658, 0.971534921, 0.923846091, 1), X_S1_3d_3 = c(0.938922191, 0.80228642, 1.433899521, 1), X_S1_5d_1 = c(0.768844884, 1.458863535, 0.880239008, 1), X_S1_5d_2 = c(0.586314866, 1.027767798, 0.831469797, 1), X_S1_5d_3 = c(0.604124099, 1.502330028, 1.101895903, 1), mut2_S1_0d_1 = c(2.085432338, 0.861943427, 0.509210189, 1), mut2_S1_0d_2 = c(1.774970153, 1.074569974, 3.128664718, 1), mut2_S1_0d_3 = c(2.003870102, 0.753483213, 1.047020362, 1), mut2_S1_1d_1 = c(1.168381858, 1.15001272, 0.580462548, 1), mut2_S1_1d_2 = c(1.33284456, 0.450460567, 0.959430252, 1), mut2_S1_1d_3 = c(1.106332747, 0.466636391, 0.660254618, 1), mut2_S1_3d_1 = c(0.859543853, 1.188445442, 1.044546139, 1), mut2_S1_3d_2 = c(1.022929555, 1.259366417, 1.776709656, 1), mut2_S1_3d_3 = c(0.917527143, 2.137370791, 0.669765284, 1), mut2_S1_5d_1 = c(0.642810843, 0.496709803, 0.801885112, 1), mut2_S1_5d_2 = c(0.879777521, 1.170165217, 1.793443182, 1), mut2_S1_5d_3 = c(0.816650769, 0.864352103, 0.768312731, 1), mut5_S1_0d_1 = c(1.936291138, 0.721197246, 1.885982652, 1), mut5_S1_0d_2 = c(2.136240851, 0.925363277, 0.282462799, 1), mut5_S1_0d_3 = c(1.986120429, 0.677085837, 0.124936834, 1), mut5_S1_1d_1 = c(1.346339786, 0.989266319, 1.396700558, 1), mut5_S1_1d_2 = c(1.489199506, 1.269083963, 1.48921516, 1), mut5_S1_1d_3 = c(1.584229502, 0.88246637, 2.25267634, 1), mut5_S1_3d_1 = c(0.755948531, 1.451613602, 0.898362008, 1), mut5_S1_3d_2 = c(0.824308907, 0.5962476, 0.523055204, 1), mut5_S1_3d_3 = c(0.753359409, 0.753222103, 0.948441646, 1), mut5_S1_5d_1 = c(0.788525215, 1.85338769, 0.951693842, 1), mut5_S1_5d_2 = c(1.010417043, 1.983625345, 1.086768544, 1), mut5_S1_5d_3 = c(0.630454563, 1.439599004, 1.416591771, 1), mut7_S1_0d_1 = c(1.672072567, 0.611243763, 0.705364938, 1), mut7_S1_0d_2 = c(1.738837658, 0.503828595, 0.499147343, 1), mut7_S1_0d_3 = c(2.149037252, 1.192787265, 1.226895377, 1), mut7_S1_1d_1 = c(1.421761015, 1.084490092, 0.497815065, 1), mut7_S1_1d_2 = c(1.068782794, 0.584950798, 0.38078948, 1), mut7_S1_1d_3 = c(1.229045044, 0.822348277, 0.449995849, 1), mut7_S1_3d_1 = c(0.890386073, 0.802513638, 0.757190729, 1), mut7_S1_3d_2 = c(1.022619118, 0.806565748, 0.645204575, 1), mut7_S1_3d_3 = c(0.80348663, 0.753993198, 0.593817113, 1), mut7_S1_5d_1 = c(0.780575903, 0.724989068, 0.92248483, 1), mut7_S1_5d_2 = c(0.743592574, 1.279872561, 1.201887432, 1), mut7_S1_5d_3 = c(0.522476113, 0.751493063, 0.899865367, 1), mut9_S1_0d_1 = c(1.247510942, 0.762934403, 2.009134613, 1), mut9_S1_0d_2 = c(1.159843529, 0.684622155, 0.499925077, 1), mut9_S1_0d_3 = c(1.247510942, 0.762934403, 2.205521099, 1), mut9_S1_1d_1 = c(1.139288266, 0.530593446, 0.767442607, 1), mut9_S1_1d_2 = c(1.257958733, 0.780701299, 0.77153391, 1), mut9_S1_1d_3 = c(1.230762109, 0.536139676, 0.742313942, 1), mut9_S1_3d_1 = c(0.809093089, 0.59528538, 0.804481151, 1), mut9_S1_3d_2 = c(0.853017549, 0.826757331, 1.141960538, 1), mut9_S1_3d_3 = c(0.813029821, 0.748971384, 1.964723247, 1), mut9_S1_5d_1 = c(0.797277294, 1.327830526, 0.943500196, 1), mut9_S1_5d_2 = c(0.669946954, 1.011869145, 0.979867227, 1), mut9_S1_5d_3 = c(0.525670301, 1.067407334, 0.76001394, 1)), .Names = c("Targets", "X_S1_0d_1", "X_S1_0d_2", "X_S1_0d_3", "X_S1_1d_1", "X_S1_1d_2", "X_S1_1d_3", "X_S1_3d_1", "X_S1_3d_2", "X_S1_3d_3", "X_S1_5d_1", "X_S1_5d_2", "X_S1_5d_3", "mut2_S1_0d_1", "mut2_S1_0d_2", "mut2_S1_0d_3", "mut2_S1_1d_1", "mut2_S1_1d_2", "mut2_S1_1d_3", "mut2_S1_3d_1", "mut2_S1_3d_2", "mut2_S1_3d_3", "mut2_S1_5d_1", "mut2_S1_5d_2", "mut2_S1_5d_3", "mut5_S1_0d_1", "mut5_S1_0d_2", "mut5_S1_0d_3", "mut5_S1_1d_1", "mut5_S1_1d_2", "mut5_S1_1d_3", "mut5_S1_3d_1", "mut5_S1_3d_2", "mut5_S1_3d_3", "mut5_S1_5d_1", "mut5_S1_5d_2", "mut5_S1_5d_3", "mut7_S1_0d_1", "mut7_S1_0d_2", "mut7_S1_0d_3", "mut7_S1_1d_1", "mut7_S1_1d_2", "mut7_S1_1d_3", "mut7_S1_3d_1", "mut7_S1_3d_2", "mut7_S1_3d_3", "mut7_S1_5d_1", "mut7_S1_5d_2", "mut7_S1_5d_3", "mut9_S1_0d_1", "mut9_S1_0d_2", "mut9_S1_0d_3", "mut9_S1_1d_1", "mut9_S1_1d_2", "mut9_S1_1d_3", "mut9_S1_3d_1", "mut9_S1_3d_2", "mut9_S1_3d_3", "mut9_S1_5d_1", "mut9_S1_5d_2", "mut9_S1_5d_3"), class = "data.frame", row.names = c(NA, -4L))

1

Issue with the code is here: df2 <- cbind(df1,str_split_fixed(df1\$Transitions,"_",3)). When you split the string (example: mut9_S1_3d_2) with _, you would get 4 strings and you are selecting 3. I meant ggplot2 not ggplot.

code:

``````library(tidyr)
df1=gather(test,"TP","Values",-Targets)

library(stringr)
df2=cbind(df1,str_split_fixed(df1\$TP,"_",4))
colnames(df2)[4:7]=c("a","b","time","replicate")
df2\$mut = paste(df2\$a, df2\$b, sep="_")

library(dplyr)
df3=select(df2, -c(a,b))

library(Rmisc)
names(df3)
df4=summarySE(df3, measurevar="Values", groupvars=c("time","mut","Targets"))
View(df4)
View(df)

library(ggplot2)
ggplot(df4, aes(time, Values, group = mut, color = mut)) +
geom_line() +
geom_point() +
facet_wrap( ~ Targets) +
labs(title = "Gene expression ", x = "Time (hr)", y = "Measurement") +
theme_linedraw() +
theme(
plot.title = element_text(hjust = 0.5, size = 20),
strip.text = element_text(size = 20),
axis.title.y = element_text(size = 20),
axis.title.x = element_text(size = 20),
axis.text.x = element_text(size = 14),
axis.text.y = element_text(size = 14)
) +
geom_ribbon(aes(ymax = Values + sd, ymin = Values - sd),
alpha = 0.3,
fill = "grey70",
colour=NA
)
``````

with error bar only (mean +/- SD):

For error bars, replace following code in the code with one that is highlighted in red below:

``````geom_ribbon(aes(ymax = Values + sd, ymin = Values - sd),
alpha = 0.3,
fill = "grey70",
colour=NA
)
``````

with `geom_pointrange(aes(ymax=Values+sd, ymin=Values-sd))`

Thank you cpad0112 Thank you for helping me this much even though you do not know me personally. I will never forget your help. When the right time comes I promise I'll acknowledge all the people help me without knowing me. I wish if I know who you are (being anonymous is the best for me like dummy only). I appreciate your time and kindness!!! Wishing you the best!!!

No problem and thank you. Which ever post helped you in resolving issue, mark it as answer. Good luck with your research.

Note: When one gets a help/suggestion from Biostars, it the help/suggestion from forum (Biostars) and vice versa is also true. When one asks a questions/suggestion/issue, it is not individual one, It is a subject/knowledge related issue. If you like/wish to acknowledge any help/suggestion from this forum, you can cite/acknowledge forum in technical writings (thesis/manuscript/presentation/abstract etc). Please contact admins for instructions on how to cite biostars forum. Please note that acknowledgement/citation mentioned above is mere a suggestion, not mandatory.