Whay ranking count data lead to uniform distribution of the features ?
1
0
Entering edit mode
9.8 years ago
jack ▴ 960

One of the normalization methods of count data is ranking . that is you rank the expression of each genes across different samples and it's lead to uniform distribution for every gene.

Does someone knows why it leads to uniform distribution ?

RNA-Seq R rna-seq next-gen sequencing • 2.7k views
ADD COMMENT
1
Entering edit mode
9.8 years ago
Michael 54k

I guess it doesn't. I guess you do not mean 'uniform'? Please try to read a bit on topic before you make such claims.

Edit, your claim is not (generally) true, in case of ties in ranks! (Proof trivial)

Please read "3.2.9 Ranks" of the paper you cite:

The ranks are uniformly distributed from zero to the sample size . Hence, the ranks lead to exactly the same distribution for all genes, which directly leads to exactly equal means and variances for all genes.

While you claim:

that is you rank the expression of each genes across different samples and it's lead to uniform distribution for every gene.

These do not have the same meaning, your sentence could be interpreted as such that the variation in counts across samples for each gene follows a uniform distribution, while what the authors state is that for each sample the ranks of counts are uniformly distributed.

The reason is simply, if you rank N items, you get a different permutation of 1..N as ranks for each sample, however each value from the domain 1..N is present exactly once, making the distribution uniform (identical probability for each value). Disregarding ties though! If there are many ties, than the resulting distribution of ranks is no longer uniform.

That is true for any set of integers btw, because a set contains each value once, yielding probability 1/N for each value.


(side node: one can also conclude from a different angle on how this sentence was meant (across repeated measurements or samples (rows) or within samples (columns)):

Imagine counts of several genes, which ranks do make sense to use? If we ranked row count values (rank columns for each gene), we would have no normalization for library size whatsoever, and therefore the ranks would not be very informative.

If you compared ranks of genes within columns (rank all genes in each sample), the library size effect would be removed achieving effective normalization. Now in this case, means of row ranks cannot be equal for each pair of genes unless your ranked counts are totally random. Instead, the more reproducible your samples are the better the rank agreement will be between all samples.)

ADD COMMENT
0
Entering edit mode

Indeed it does!!. so do you have reason why not ?! ;-) look at this paper figure 2.

ADD REPLY
1
Entering edit mode

In science, you need to utilize a very precise language, for example, "leads to uniform distribution for each gene" is something different than "leads to uniform distribution over all genes".

ADD REPLY

Login before adding your answer.

Traffic: 2089 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6