Lets say the featureCounts output is this (for convenience only contain 2 genes:
Geneid Chr Start End Strand Length myfile.bam
Xkr4 chr1;chr1;chr1 3214482;3421702;3670552 3216968;3421901;3671498 -;-;- 3634 100
Npbwr1 chr1 5913707 5917398 - 3692 30
As far as I know the formula of RPKM is
RPKM = count_assigned_to_gene / (gene_length/1000 * totalReads /1e6)
My question is where can I get totalReads? Is it directly from the featureCounts output (100 + 30 ==130)?
Xkr4 = 100 / (3634/1000 * 130 / 1e6)
Or should I get it from BAM files e.g. bamtools stats
.
@b.nota in other words the
totalReads
in the calculation is not the total assigned reads to the genes? So I have to directly count from the BAM file? Would prefer to recode cause later need to calculate TPM as well.From edgeR help page:
lib.size
: library size, defaults tocolSums(x)
So to answer your question, your
totalReads
should be the sum of the whole sample column indeed. No need to use your BAM file.@b.nota i.e. 100 + 30 = 130 (sum of 7th column of featureCounts output)??
Yes, those are your read counts. If you have more samples you can use featureCounts for all BAM files in one run.