I am trying to integrate RNA-seq with Proteomics dataset. I used the DEseq2 for normalization dataset. I have 2 questions for next step:
I aim to compute z-score for RNA-seq data. I understand that the normalization from DEseq2 does not take the gene lengths into consideration, which mean that the genes that are longer in lengths could drive the distribution of normalized counts. For this reason, it may not be valid for further analysis on normalized data. Is this correct? if so, is there any step for dealing with gene lengths? Shall I perform another kinds of FBKM normalization on the DEseq2-normalized data?
For integration with proteomics, shall I: (1) normalize the two data first, then work on the overlapping genes/proteins, or (2) overlapping the two data first, and normalization the subset of these overlapping gene/proteins. I think the (2) approach make little to no sense for normalization as I may threw a lot of gene and proteins that are not overlapping out, which may affect the original distribution of the data.
Can anybody share some thought :)