I'm a newbie to analyzing RNAseq data and wanted to get input on how to proceed forward with data that I received from my PI. The goal of the experiment was to compare gene expression across blood cells from different donors all under the same condition. There are donors of a given phenotype (e.g., S1, S2...) and another phenotype (e.g., P1, P2...) I have been given two files: data that has read counts and data that has been quantile normalized. The files are organized as follows:
Read Count File
Gene S1 S2 P1 B2M 174991 119507 166104 LYZ 69046 35013 24405 ....
Quantile Normalized File
Gene S1 S2 P1 B2M 8449.38 8449.38 2821.43 LYZ 5186.47 1476.66 850.11 ....
I have been informed to assess differences between samples by using the quantile normalized values. However, if I want to compare the expression of B2M, for example, between different samples (e.g., S1 and P1), do I need to normalize the quantile normalized values to a housekeeping gene (e.g., GPI) and then compare or do I just compare the values 8449.38 to 2821.43?
Or alternatively, should I turn to the read count file to re-analyze?
Furthermore, we'd like to do a GSEA for between the two different phenotypes (e.g., S samples versus P samples). Any advice on how to combine the data for S donors and P donors to attempt this?
Any advice, insight or pointing to relevant questions on Biostars is extremely appreciated.