How to compare these distributions?
1
0
Entering edit mode
6.9 years ago
samuelmiver ▴ 440

I have two different data frames , one per condition, where each one of them has 3 columns:

specie : a qualitative value

population@4days : population calculated for that specie at 4 days

population@12days : population calculated for that specie at 12 days

Using basic calculations I have computed the growth rate for each population and now I want to compare the two distribution of rates ( I am interested in a global conclusion, not specific for each specie)

My question is if I need to normalize to correctly compare them. The number and species in each one is different and the sum of population is also different between all of them.

If I have to, which will be the best way to normalize and compare them?

Thank you in advance for your consideration.

sequencing statistics growth distributions • 1.9k views
0
Entering edit mode

if I get your question right you are asking how to (if) normalize and compare the populations at 4 vs. 14 days, is that correct?

if so, do you have a day 0? that would be your initial point where to normalize that data to.

how many samples do you have per each time point?

do you need/want to do 0 vs. 4; 0 vs. 12; 4 vs. 12?

also, population of...?..minions?

0
Entering edit mode

Yes! That is correct, is a global comparison of growth for different bacterial populations exposed to two different conditions and data collected at 4 and 12 days. I don't have a time 0 but I have 2 replicas per time for each condition (2 conditions, 2 times, 2 replicas each one).

The comparison tries to define is one of the conditions allows a better general growth for the mixed population.

1
Entering edit mode
6.9 years ago
TriS ★ 4.6k

ok, so...a couple of things:

1. this sounds more like statistics than bioinformatics, so maybe stats.stackexchange.com would give you a more comprehensive answer and explanation

2. if you look at the distribution overall, a kolmogorov smirnov test could work (ks.test() in R) since it's a non-parametric test that checks whether the values come from the same distribution.

3. a wilcox test is also non-parametric, looks at the median values and does not require normally distributed values (wilcox.test() in R)

4. a simple t.test could be used if you have normally distributed data and enough samples, it will compare the mean of the values.

I don't know how your data look or how many values you have, I would go with either ks.test() or wilcox.test()

0
Entering edit mode

Thank you very much for your response! I will try the ideas you suggest.

0
Entering edit mode

The data is not normal. I have enough data to apply central limit theorem so I am currently subsampling and computing the mean 10000 times and doing the study with the distributions of means.

0
Entering edit mode

not sure why you want to subsample and do a bootstrapping-like approach. you don't need to apply the central limit theorem...maybe check if other papers that did the same study used it?