Question

Showing samples in or out of a range central tendencies

0

Entering edit mode

6.5 years ago

zizigolu ★ 4.4k

Hi,

I have calculated mean , mean - sd and mean + sd for a bunch of samples in terms of some negative controls. I want to illustrate samples in and out of + and - SD but I don't know how to bold these samples something like this

enter image description here

Any help please

> head(data)
   sample      mean  mean+sd    mean-sd
1:     A2 -1.210713 1.541450 -3.9628767
2:     A3  3.125620 5.877783  0.3734567
3:     A4  2.687265 5.439429 -0.0648978
4:     A6  4.989040 7.741203  2.2368766
5:     A7 -1.194626 1.557537 -3.9467896
6:     A8 -1.628225 1.123939 -4.3803880
>

R ggplot2 • 2.1k views

ADD COMMENT • link 6.5 years ago by zizigolu ★ 4.4k

5

Entering edit mode

I would suggest you use a violin plot with a swarm plot, rather than a box plot, e.g.:

See also this tweet for what can be wrong with boxplots:

I'd prefer to show all the data, but if you want a summary visual, violin plots do a nice job of showing the distribution. pic.twitter.com/bZuTl1lIhn
— Justin Matejka (@JustinMatejka) August 9, 2017

More plotting suggestions can be found in this blog post.

ADD REPLY • link 6.5 years ago by WouterDeCoster 48k

0

Entering edit mode

Thank you, I will need the names of samples being bolded to exclude samples out of this ranges

ADD REPLY • link 6.5 years ago by zizigolu ★ 4.4k

0

Entering edit mode

But then why do you need a plot? Just use the values and set cutoffs...

ADD REPLY • link 6.5 years ago by WouterDeCoster 48k

0

Entering edit mode

I looked by eyes likely all samples are in range. I have two datasets I will need to compare which of matched samples among 2 datasets more deviates from this range though :(

ADD REPLY • link 6.5 years ago by zizigolu ★ 4.4k

0

Entering edit mode

A side note: mean should be accompanied by "standard error of mean (SEM)" and not the standard deviation

ADD REPLY • link 6.5 years ago by Santosh Anand 5.8k

0

Entering edit mode

Could you explain why? I mean, the SEM is used (colloquially) to indicate how far the sample mean is likely to be from the population mean. What relevance is that when trying to decide whether an individual sampled point is an outlier within a given sample from the population?

ADD REPLY • link 6.5 years ago by russhh 5.8k

0

Entering edit mode

Actually this is a HTG EdgeSeq assay. In this assay we have 4 negative probes by which we can check the quality of sample. The mean of raw counts assign by each of negative probes should place in plus and minus one standard deviation of total mean of sample means. I want to recognize bad samples as outliers

ADD REPLY • link 6.5 years ago by zizigolu ★ 4.4k

0

Entering edit mode

Sorry, I wasn't referring to you F. It was the advice from Santosh that I was questioning

ADD REPLY • link 6.5 years ago by russhh 5.8k

0

Entering edit mode

Hi russhh, and sorry for a late followup. I see your point and you are right. I hurriedly looked at the problem, and as the OP was calculating samples means and sd in the data above, I thought that it was about putting error bars on the estimation of the sample means. But since it is about outlier detection of sample points, sd of samples is the correct approach. However, I am not sure how much power 1-sd will give to identify outliers - as about 1/3rd of the data is outside 1-sd in any normal distribution.

ADD REPLY • link 6.5 years ago by Santosh Anand 5.8k