I am analyzing RNA-seq data produced with 3' mRNA-Seq.
I calculated gene counts using HTSEQ-count and I must normalize my counts to perform a Kaplan-Meier analysis. I would like to use CPM normalization considering that I can’t normalize data also based on gene length.
I have a question about the CPM normalization method. Considering that the formula is CPM = ((counts on the features) / library size) X 1,000,000, it normalized the count by the library size.
I was wondering if the library size should be:
- the number of raw sequence reads produced by the sequencing;
- the number of unique mapped reads to the features, i.e. the sum of the counts of all features given by HTSEQ-count.