Sampling error in well-mixed homogeneous samples
1
0
Entering edit mode
9.6 years ago
NHEJ ▴ 360

In statistical bioinformatics, if I know that 15% of my 25,000,000 sized population of cells is comprised of B cells (which I'm interested in studying), and then I'm told that a sample size of 1000 randomly chosen cells from this population of 25 million is chosen and it has a certain number of reactive B cells associated with it, let's call this number x...

Can I just take that value of x and multiple it by .15 to get the correct amount of reactive B cells that I would see in only my B cell population (from within that general sample pool of 1000 that was taken from the big pool of 25,000,000 cells of all kinds of types)?

I'm confident I can't do this that easily, because there is sampling error involved. In other words, it would not be correct to assume that the same percentage of my B cells exists in my very small subset (1000 total cells) [of the total population (25,000,000)] as exists in my population as a whole. HOWEVER, this is a very well-mixed homogenous total cell pool. So, thus, is there anything I can (or should) do to correct for the possible sampling error (even though the system is very homogeneous)? Assuming there was perfect homogeneity, would sampling error no longer remain an issue of concern?

sampling-error • 2.1k views
ADD COMMENT
1
Entering edit mode
9.5 years ago

Given the fact the population size ( N) is extremely larger than the sample size (n) and the population is homogenous, one can assume that he is drawing with replacement. Under such an assumption the number of B-cells X is a random variable distributed according to Binomial distribution. So the mean is n p = 150 and the standard deviation is sqrt[n p * (1 - p)] = 11, more than ten times smaller than the mean. I think this is an acceptable noise level for biological data.

However, my experience tells me, that immune cells tend to aggregate really a lot. E.g. when studying a highly-expanded CMV-specific T-cell clonotype using NGS, we found that its frequency had a much higher variance coefficient than those of rare clonotypes. We have later found out that this is due to cell aggregates of activated T-cells.

ADD COMMENT

Login before adding your answer.

Traffic: 2530 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6