Question: Ranking of genes in multiple samples
14 months ago by
I have log2 transformed expression data of multiple samples belong to one condition (tumour). I dont have control dataset. Based on one conditional data, how to rank genes across samples (highly expressive or very low expressive genes). should I take mean of expression values across all the samples and rank genes accordingly ? Is it right way to rank genes for one conditional samples?

how many samples is 'multiple' samples? What are you ultimately trying to achieve by ranking them - what is the biological question?

Thanks for your quick response I have 20 samples belongs to tumor. My key biological target is to predict genes playing role in tumor state . As control dataset is missing in my case (Based on control dataset, I would be relatively easy to rank them just by performing the differential expression and setting cutoff 2 fold up and down regulated genes). As in my case, I have of one conditional data. Therefore I want to rank them by taking mean across samples and later on will do GO enrichment in order to predict what kind of biological entities or functions are getting enriched in top ranked genes. Is there any other way to do ranking across samples? (I know I am asking basic question, but I want to be sure for my strategy).

You can use median instead of mean since using mean is sensitive when you have an outlier.

I sound like you are assuming that those genes which show, on average, the highest expression across your samples will be the interesting (disease relevant) ones. I'm not sure I agree with this. Simply taking an average value will not account for the variability between samples.

