Question

Most efficient way to convert Counts to RPKM

0

Entering edit mode

3.4 years ago

dk0319 ▴ 70

I obtained gene counts in order to perform differential expression. Now I want to generate an MA plot comparing fold-change to RPKM. Is there a streamlined way to do this directly from a file containing gene id's and counts, without having to work with the bam file from which the counts were generated? Cheers

R rna-seq • 4.1k views

ADD COMMENT • link updated 3.4 years ago by ATpoint 81k • written 3.4 years ago by dk0319 ▴ 70

1

Entering edit mode

3.4 years ago

ATpoint 81k

You can use (log)CPMs from edgeR starting from a count matrix, it is a one-liner, see Basic normalization, batch correction and visualization of RNA-seq data. Don't use any of these naive metrics that only scale by library size, it is typically not sufficient as it fails to correct for library composition. edgeR has a rpkm function though, which is simply its normalized counts divided by gene length, but I would not use this since in the differential testing you do not consider gene length and the MA-plot is actually meant to visualize the DE results, so just use cpm() as described in the lined post.

ADD COMMENT • link 3.4 years ago by ATpoint 81k

score 2 · Accepted Answer · 2020-11-19

2

Entering edit mode

3.4 years ago

h.mon 35k

It is better to use TPM than FPKM, see Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples (this point has been made repeatedly here, it will probably appear at the Similar posts at the right.

However, for a MA-plot, even TPMs are unnecessary, edgeR (maPlot) and DESeq2 (plotMA) have functions to draw MA plots directly from the counts.

ADD COMMENT • link 3.4 years ago by h.mon 35k

0

Entering edit mode

Is there a work flow for generating the TPM from counts?

I am attempting to design a reporter construct using a list of significantly differentially expressed genes, but with consideration to the number of Transcripts that are present in in my control samples compared to my test samples. My hope is that this information would identify which genes have low enough expression in the test relative to the control to make a reporter that is highly specific and sensitive for monitoring protein function.

ADD REPLY • link 3.4 years ago by dk0319 ▴ 70

1

Entering edit mode

TPM itself is simple, see Raw counts to TPM in R but you seem to be working on something (at least it sounds that way) not-so-standard so if you seek guidance with that or want to hear opinions on whether your strategy makes sense or not you woul need to explain better what you are actually doing, what the setup is and what kinds of data you have. You also seem to be mixing transcript and gene level counts here, at least you say transcripts, but also talk about differential genes, that is not the same.

ADD REPLY • link 3.4 years ago by ATpoint 81k

1

Entering edit mode

Salmon can output counts and TPMs, and is really fast - it will run a few dozen samples in less than one hour.

TPMs estimated from gene counts are bad estimates, see a good explanation here: DESeq2: Is it possible to convert read counts to expression values via TPM and return these values?.

ADD REPLY • link 3.4 years ago by h.mon 35k