I want to transform a TCGA mRNA expression matrix (in linear data format) to log2-ratios and then run a feature (gene) selection, selecting the 1000 most variant genes (genes with higher standard deviation across samples). The workflow is the following:
1. Select "good" genes before log2ratio (genes each with median signal at least t in p% of samples);
2. On selected genes, run log2ratio, dividing each gene by its median signal and then log2-transforming the result matrix;
3. Select the 1000 most variant genes along all samples.
How do I select t and p?