I am trying to perform a within-sample analysis of expression data using FPKM - essentially I want to be able to rank my genes. I am not doing differential expression.
I have 3 biological replicates of my sample. What I would like to be able to do is generate an FPKM for each gene that is representative of the three replicates and rank based on that.
I have three approaches:
- Run Cuffquant on the three alignments separately and ask Cuffnorm to treat them as replicates. This would be my ideal, but I would need something to compare to in order to get Cuffnorm to run. Could I just compare my replicates to themselves?
- Run Cuffquant on the three alignments separately, ask Cuffnorm to treat them as separate samples (giving me FPKM values per sample) and then take a mean
- Run Cuffquant on the merged alignments. This won't account in any way for variation between the replicates.
I have mostly discounted 3 as pooling the data seems to defeat the purpose of replicating. So my questions are:
For approach 1, does anyone know how a good way of generating representative FPKM values from replicates without doing a differential expression? Happy to use software other than Cufflinks.
For approach 2, is taking a mean valid? Cuffnorm is using a more sophisticated normalisation method, but perhaps I don't need this here - my feeling is that even if library sizes are different between the replicates, the rank order should still be similar and that is what I am interested in. If this is a sensible approach, is mean a good measure of central tendency here, or should I consider something else?
Does anyone have any other approaches that I haven't thought of?
Thanks for your help!