I am working on a RNA-Seq data set from mouse. I have done the mapping and the counting and got a table of count data (a matrix of counts for each gene and sample), which looks like that:

ID 1 2 3 4 5 6 ... 20 ENSMUSG00000058997 5250 5187 1431 3177 575 3870 ... 4375 ENSMUSG00000038079 639 513 630 3672 640 313 ... 292 ENSMUSG00000053168 336 1250 1242 4330 254 184 ... 2146 ...

In total I have 20 single mouse samples from 9 different conditions.

and I have a vector of Scores, which we calculated based on different phenotypical parameters like age, weight, longevity and some more parameters. The score varies between 3-16, where `3`

means phenotypically best and `16`

phenotypically worse.

Our main goal is to try and find a way to identify the genes which contribute to this scoring results. The score was calculated independently of the conditions, so that we can analyse each mouse as a single entity. I do have multiple animals with the same score value, but some score values are unique. For that reason I don't have the usual replica, so AFAIK regression analysis cannot be done.

To get around the fact, that some genes in some samples have a very high read count compared to others, we were thinking about normalizing each row by calculating a fold-change for each gene/sample. This will be done by calculating the average for each gene over all samples and than dividing each gene/sample by this value.

So the idea is to try and calculate some kind of correlation between the score vector and each gene. The higher this correlation coefficient value is to 1 the gene should have a higher influence on the behaviour of the score.

I don't know if it is possible to calculate this kind of correlation, so I was also thinking about a semi-quantitative method like clustering , while using the score values as the x-axis and cluster the genes only according tot the rows. But this method will only able me to look for groups with my own eyes and with no "real" significance.

I hope I explain myself clear enough to understand my goal in this analysis I would appreciate any help and/or Ideas on how to proceed here

Thanks,

Assa

PS

I have also asked this question here, as I wasn't sure which of the two repositories is the right one to do it.