How to compare these distributions?
1
0
Entering edit mode
8.7 years ago
samuelmiver ▴ 440

I have two different data frames , one per condition, where each one of them has 3 columns:

specie : a qualitative value

population@4days : population calculated for that specie at 4 days

population@12days : population calculated for that specie at 12 days

Using basic calculations I have computed the growth rate for each population and now I want to compare the two distribution of rates ( I am interested in a global conclusion, not specific for each specie)

My question is if I need to normalize to correctly compare them. The number and species in each one is different and the sum of population is also different between all of them.

If I have to, which will be the best way to normalize and compare them?

Thank you in advance for your consideration.

sequencing statistics growth distributions • 2.5k views
ADD COMMENT
0
Entering edit mode

if I get your question right you are asking how to (if) normalize and compare the populations at 4 vs. 14 days, is that correct?

if so, do you have a day 0? that would be your initial point where to normalize that data to.

how many samples do you have per each time point?

do you need/want to do 0 vs. 4; 0 vs. 12; 4 vs. 12?

also, population of...?..minions?

ADD REPLY
0
Entering edit mode

Yes! That is correct, is a global comparison of growth for different bacterial populations exposed to two different conditions and data collected at 4 and 12 days. I don't have a time 0 but I have 2 replicas per time for each condition (2 conditions, 2 times, 2 replicas each one).

The comparison tries to define is one of the conditions allows a better general growth for the mixed population.

ADD REPLY
1
Entering edit mode
8.7 years ago
TriS ★ 4.7k

ok, so...a couple of things:

1. this sounds more like statistics than bioinformatics, so maybe stats.stackexchange.com would give you a more comprehensive answer and explanation

2. if you look at the distribution overall, a kolmogorov smirnov test could work (ks.test() in R) since it's a non-parametric test that checks whether the values come from the same distribution.

3. a wilcox test is also non-parametric, looks at the median values and does not require normally distributed values (wilcox.test() in R)

4. a simple t.test could be used if you have normally distributed data and enough samples, it will compare the mean of the values.

I don't know how your data look or how many values you have, I would go with either ks.test() or wilcox.test()

ADD COMMENT
0
Entering edit mode

Thank you very much for your response! I will try the ideas you suggest.

ADD REPLY
0
Entering edit mode

The data is not normal. I have enough data to apply central limit theorem so I am currently subsampling and computing the mean 10000 times and doing the study with the distributions of means.

ADD REPLY
0
Entering edit mode

not sure why you want to subsample and do a bootstrapping-like approach. you don't need to apply the central limit theorem...maybe check if other papers that did the same study used it?

ADD REPLY

Login before adding your answer.

Traffic: 2709 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6