Hi there,
I want to select random genes while keeping the expression level similar in both pools.
Pool-A: 5000 genes
GeneIDX1 ExpVal_rpkm
GeneIDX2 ExpVal_rpkm and so on
Pool-B 1000 genes
GeneIDY1 ExpVal_rpkm
GeneIDY2 ExpVal_rpkm and so on
Now I want to select 1000 random genes from Pool-A, which has similar expression level as the 1000 genes in Pool-B.
Any suggestions/helps on how to do this (neatly) will be appreciated.
I am thinking of binning genes (from Pool-B) into different bins based on the expression, then use similar bins on Pool-A. Then from each bin select the same number of genes (from Pool-A) as in Pool-B.
Thanks!
To be honest this sounds a bit like it is conflating the concept of random selection with the limitation of it being similar even though that their distribution might be different. The problem that I see is that some genes have may have multiple options to be selected but others do not. Yet that information is not captured in any measurable way. Resampling will just make the genes with fewer options to be replaced dominate the outcome at the expense of the genes that can be replaced with others.