Question

On calculating RPKM values for RNA-seq data

0

Entering edit mode

7.8 years ago

Yongjie Zhang ▴ 110

Hello,

I performed RNA-seq for a sample and want to calculate RPKM values. But I have two questions for your help.

1) Since paired-end Illumina sequencing was performed, can I use either single reads or fragments (i.e. paired reads) to calculate RPKM?

2) Let's suppose the total numbers of clean reads is A, among which the total number of mitochondrial reads is B. So, if I only want to calculate RPKM for mitochondrial genes, should I use A or B during calculation?

Thanks for any comments.

Yongjie

RPKM • 2.6k views

ADD COMMENT • link updated 7.8 years ago by dariober 14k • written 7.8 years ago by Yongjie Zhang ▴ 110

0

Entering edit mode

To be unbiased, always consider complete profile (all genes) and total number of reads mapped to them as library size for calculating RPKM/FPKM.

ADD REPLY • link 7.8 years ago by EagleEye 7.5k

score 0 · Answer 1 · 2016-07-22

Can I use either single reads or fragments (i.e. paired reads) to calculate RPKM?

What definitely you do not want to do is to count both read 1 and read 2. Counting fragments (i.e. both mates mapping to the same gene in correct orientation?) should be more appropriate but in practice I think counting only read 1s should be the same.

If I only want to calculate RPKM for mitochondrial genes, should I use A or B during calculation?

I guess it depends on whether you are interested in the concentration of a mitochondrial gene within mitochondrial genes or among all the genes. In the first case use mitochondrial reads otherwise use everything.

For example, in one sample you have 1M reads in the genome, 10k reads on chrM and 100 reads on gene X on chrM. In another sample you have 1M reads in the genome, 1k reads on chrM and 100 reads on gene X. In this case the concentration of X is pretty much the same in the two samples, genomewide. But relative to the mitochondrial genome the second sample is much richer in gene X (100/10k vs 100/1k).