Have a simple question but just want to double check I'm not doing something stupid.
I have paired-end RNA-seq data for which I have used featureCounts to quantify raw counts. I now want to normalize using the TPM formula. I read this blog :-
which simply says, divide read counts by gene length in kilobases to give reads per kilobase (RPK), sum all the RPK values and divide by a million for a per million scaling factor and then divide all RPK values by this scaling factor.
So taking my output from featureCounts which looks something like this : -
|NM_032291||chr1||66999639 etc||67000051 etc||+ etc etc||10934||25|
I use the values in the "Length" column as reads per kilobase or do I have to convert this to per kilobase first? It didn't say in the Subread manual much about this length value. And the value in my "sample.bam" column is definitely the read count value I need??