I understand the RPKM fomula is as follows:
C = Number of reads mapped to a gene
N = Total mapped reads in the experiment
L = exon length in base-pairs for a gene
Equation = RPKM = (10^9 * C)/(N * L)
I have the counts ( from HTSeq) and transcript length (retrieved from Ensembl API) for each gene.
My question is, for the total mapped reads(N), should I be counting the reads only in the exons for all genes? If thats right, can I just add all the gene counts from HTSeq output to get the total mapped reads.
the total mapped reads will be ALL the mapped reads in the BAM file?