Question: Showing samples in or out of a range central tendencies
0
gravatar for F
6 months ago by
F3.4k
Iran
F3.4k wrote:

Hi,

I have calculated mean , mean - sd and mean + sd for a bunch of samples in terms of some negative controls. I want to illustrate samples in and out of + and - SD but I don't know how to bold these samples something like this

enter image description here

Any help please

> head(data)
   sample      mean  mean+sd    mean-sd
1:     A2 -1.210713 1.541450 -3.9628767
2:     A3  3.125620 5.877783  0.3734567
3:     A4  2.687265 5.439429 -0.0648978
4:     A6  4.989040 7.741203  2.2368766
5:     A7 -1.194626 1.557537 -3.9467896
6:     A8 -1.628225 1.123939 -4.3803880
>
ggplot2 R • 333 views
ADD COMMENTlink written 6 months ago by F3.4k
5

I would suggest you use a violin plot with a swarm plot, rather than a box plot, e.g.:

http://i65.tinypic.com/2d14e1z.jpg

See also this tweet for what can be wrong with boxplots:

More plotting suggestions can be found in this blog post.

ADD REPLYlink modified 6 months ago • written 6 months ago by WouterDeCoster40k

Thank you, I will need the names of samples being bolded to exclude samples out of this ranges

ADD REPLYlink written 6 months ago by F3.4k

But then why do you need a plot? Just use the values and set cutoffs...

ADD REPLYlink written 6 months ago by WouterDeCoster40k

I looked by eyes likely all samples are in range. I have two datasets I will need to compare which of matched samples among 2 datasets more deviates from this range though :(

ADD REPLYlink written 6 months ago by F3.4k

A side note: mean should be accompanied by "standard error of mean (SEM)" and not the standard deviation

ADD REPLYlink written 6 months ago by Santosh Anand4.9k

Could you explain why? I mean, the SEM is used (colloquially) to indicate how far the sample mean is likely to be from the population mean. What relevance is that when trying to decide whether an individual sampled point is an outlier within a given sample from the population?

ADD REPLYlink modified 6 months ago • written 6 months ago by russhh4.6k

Actually this is a HTG EdgeSeq assay. In this assay we have 4 negative probes by which we can check the quality of sample. The mean of raw counts assign by each of negative probes should place in plus and minus one standard deviation of total mean of sample means. I want to recognize bad samples as outliers

ADD REPLYlink written 6 months ago by F3.4k

Sorry, I wasn't referring to you F. It was the advice from Santosh that I was questioning

ADD REPLYlink written 6 months ago by russhh4.6k

Hi russhh, and sorry for a late followup. I see your point and you are right. I hurriedly looked at the problem, and as the OP was calculating samples means and sd in the data above, I thought that it was about putting error bars on the estimation of the sample means. But since it is about outlier detection of sample points, sd of samples is the correct approach. However, I am not sure how much power 1-sd will give to identify outliers - as about 1/3rd of the data is outside 1-sd in any normal distribution.

ADD REPLYlink written 6 months ago by Santosh Anand4.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1223 users visited in the last hour