comparing count data with a vector of quantitative Scores
0
0
Entering edit mode
6.1 years ago
Assa Yeroslaviz ★ 1.6k

I am working on a RNA-Seq data set from mouse. I have done the mapping and the counting and got a table of count data (a matrix of counts for each gene and sample), which looks like that:

                ID     1    2    3    4    5    6 ...    20
ENSMUSG00000058997  5250 5187 1431 3177  575 3870 ...  4375
ENSMUSG00000038079   639  513  630 3672  640  313 ...   292
ENSMUSG00000053168   336 1250 1242 4330  254  184 ...  2146
...

In total I have 20 single mouse samples from 9 different conditions.

and I have a vector of Scores, which we calculated based on different phenotypical parameters like age, weight, longevity and some more parameters. The score varies between 3-16, where 3 means phenotypically best and 16 phenotypically worse.

Our main goal is to try and find a way to identify the genes which contribute to this scoring results. The score was calculated independently of the conditions, so that we can analyse each mouse as a single entity. I do have multiple animals with the same score value, but some score values are unique. For that reason I don't have the usual replica, so AFAIK regression analysis cannot be done.

To get around the fact, that some genes in some samples have a very high read count compared to others, we were thinking about normalizing each row by calculating a fold-change for each gene/sample. This will be done by calculating the average for each gene over all samples and than dividing each gene/sample by this value.

So the idea is to try and calculate some kind of correlation between the score vector and each gene. The higher this correlation coefficient value is to 1 the gene should have a higher influence on the behaviour of the score.

I don't know if it is possible to calculate this kind of correlation, so I was also thinking about a semi-quantitative method like clustering , while using the score values as the x-axis and cluster the genes only according tot the rows. But this method will only able me to look for groups with my own eyes and with no "real" significance.

I hope I explain myself clear enough to understand my goal in this analysis I would appreciate any help and/or Ideas on how to proceed here

Thanks,

Assa

PS

I have also asked this question here, as I wasn't sure which of the two repositories is the right one to do it.

score correlation regression • 2.1k views