I want to transform a TCGA mRNA expression matrix (in linear data format) to log2-ratios and then run a feature (gene) selection, selecting the 1000 most variant genes (genes with higher standard deviation across samples). The workflow is the following:
- Select "good" genes before log2ratio (genes each with median signal at least t in p% of samples);
- On selected genes, run log2ratio, dividing each gene by its median signal and then log2-transforming the result matrix;
- Select the 1000 most variant genes along all samples.
How do I select t and p?
It appears that your post has been cross-posted to another site: http://biology.stackexchange.com/questions/29878/how-to-select-genes-before-log2-ratio-on-a-rnaseq-gene-expression-matrix-based
This is typically not recommended as it runs the risk of annoying people in both communities.
Yes, I am sorry that I annoyed you. But since they are different communities (afaik they are also run by different organizations) and they address slightly different topics, I did not know which was the correct place to post this question to. I think that my question is semantically correct for both communities (even if they can address different types of users), which may have an users intersection (I did not post elsewhere).
Sorry about that.