I would like to be sure I am using the right metrics for 3 different tasks I am performing on mRNA-seq data I have got on 50 different samples (i.e. 50 different cell lines). I have read a lot about this but I am still in doubt..
Here the scenario, 50 different cell lines, for each of them I got mRNA-seq data (single measurement due to high costs) and I have to complete the following 3 tasks:
I want to evaluate the abundance distribution of the genes within each sample, meaning that I want to see whether in certain cel lines the genes are expressed more or less at the same level or if in some samples there are a set of genes that are for example 10-20 times more abundant than the others. To do this I can use RPKM or TPM, whit the latter probably more appropriate
compare the expression level of all the genes among the different sample, doing basically a sort of PCA or hierarchical clustering on the data. As I will have to compare the 50 different data sets, I would need to use RPKM or TMM (TMM normalized FPKM), with the latter the most appropriate one?
I would like also to compare the mRNA level of selected genes against the proteins abundance of the relative gene-product. For this task I will have to compare the protein iBAQ (some sort of protein absolute abundance) against a mRNA value. As the corresponding 'absolute' mRNA value does not exist (as far as I understood), I guess here either TPM , RPKM, or TMM are valid.
Did I get it right?