I am dealing with the RNASeq data. After mapping the reads to ref and obtaining the RPKM values for each gene, I want to normalized the expression values.
Starting from the RPKM values, I removed some lines with too much 0, and finally got 12K gene expression profiles.
The ranges of RPKM are 0 to 1e-6, which can not fit to the normal distribution.
I tried two methods to normalized the expression profiles:
1) assign the smallest value to 0, and then log2 transformed the data, the distribution look liked as the normal distribution, but it is actually not normalized, (do not fit N(0,1) distribution)
2) transforming the ranks of the expression values for each gene to their respective quantiles of a N(0; 1) distribution, however, the distribution profiles did not seem good enough.
So anyone has better solutions?