Entering edit mode

5 months ago

bioinformatics
▴
10

Hi,

Would anyone be able to help me add an error bar to my scatterplot of expression values derived from microarray differential expression analysis.

Please find below the commands I have used to make the graph:

```
path <- "/Users/DDLPS.csv"
df <- read.csv(path, header = TRUE, sep = ',')
Plot <- ggplot(df, aes(Samples, Expression.value, colour = Tumour.type)) + geom_point()
print(Plot + ggtitle("Gene expression differences of X between WDLPS and DDLPS"))
```

Expression values are in the table below.

```
head(df)
Samples Expression.value Tumour.type
1 GSM766533.CEL 10.013128 DDLPS
2 GSM766534.CEL 9.293059 DDLPS
3 GSM766535.CEL 10.821439 DDLPS
4 GSM766536.CEL 10.494755 DDLPS
5 GSM766537.CEL 10.736248 DDLPS
6 GSM766538.CEL 10.067121 DDLPS
```

I have tried to add the error bar with the following command but received an error message:

```
TIMP1Plot <- ggplot(df, aes(Expression.value, Samples, colour = Tumour.type)) + geom_point() + geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd), width = 0.05)
```

Error in mean - sd : non-numeric argument to binary operator

Can anyone help me correct this?

Thanks!

Why not simply putting a

`geom_boxplot`

alongside the scatter?Thanks! Where exactly do I put the geom_boxplot in the command line?

`+ geom_point() + geom_boxplot()`

Ok thanks, I have now done this however the error bar appeared on the legend not points. If I wanted to add an error bar for each point on the plot. Does the command line change?

Sorry, I didn't read you question through. You should have columns in

`df`

named`mean`

and`sd`

for the expression in`geom_errorbar`

to be evaluated with.In response to this comment: You got a bit of a mess here. First of all you switched the parameters Samples and Expression.value, Samples should be on the X (first parameter) and Expression.value on the Y axis (second). Next, what is it that you're plotting? is it a single gene in multiple samples? What is the meaning of the mean and SD here? If you want the mean and SD of the expression in all the samples then geom_boxplot should give you this (it actually gives you something a bit [different][1]). [1]: https://www.r-bloggers.com/2012/06/whisker-of-boxplot/

Thankyou for your feedback. I have now switched Samples and Expression values to the correct parameters. I'm plotting the expression values of a single gene in 92 samples which are either classed as WDLPS or DDLPS tumours. I have calculated the mean expression of all samples and then the SD using the mean.

I have used the following commands:

and I still get a graph with all the error bars spread across one point on the y axis.

What is the meaning of

`sd`

for an individual point? SD is relevant when considering a population. You can compute the mean and SD of the populations you have here (samples in the two tumor types).The sd of an individual point is how spread the point/expression value of the gene in the sample is from the mean expression of all the samples. Also, the smaller the bar the more reliable the value and the larger the bar the less reliable. To calculate this I used the following formula: =STDEV.S(B2:D2) where B2= mean expression, and D2= expression value of the particular sample.

You're misusing this formula. STDEV.S is for estimating the standard deviation of a population. What you are computing is basically the distance of each point from the mean which is meaningless in this context.

Ok thanks for your help. How might I correctly calculate the SD?

Compute it using all the expression values in the population (samples in each tumor type I assume). You will end up with one value for each population, alongside one mean value for each population.

Ok thanks. I have ended up with a mean value and sd for each population. Is this table correct?

Each row is one sample so the mean and sd should not be a part of the table. There are elegant ways to plot the mean and SD of a population with ggplot, boxplot is one and the most common (median instead of mean but it is a population description).

Ok thanks for your response. If the sd and mean values are not part of the table where should I put them?

I'm hoping to end up with 2 error bars, one across the WDLPS points and another across the DDLPS points.

You can use stat_summary with the function

`mean_cl_boot`

for instance.Thanks for your help, it still didn't work.