Question: Most efficient way to convert Counts to RPKM
gravatar for dk0319
10 days ago by
dk03190 wrote:

I obtained gene counts in order to perform differential expression. Now I want to generate an MA plot comparing fold-change to RPKM. Is there a streamlined way to do this directly from a file containing gene id's and counts, without having to work with the bam file from which the counts were generated? Cheers

rna-seq R • 102 views
ADD COMMENTlink modified 10 days ago by ATpoint42k • written 10 days ago by dk03190
gravatar for h.mon
10 days ago by
h.mon31k wrote:

It is better to use TPM than FPKM, see Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples (this point has been made repeatedly here, it will probably appear at the Similar posts at the right.

However, for a MA-plot, even TPMs are unnecessary, edgeR (maPlot) and DESeq2 (plotMA) have functions to draw MA plots directly from the counts.

ADD COMMENTlink written 10 days ago by h.mon31k

Is there a work flow for generating the TPM from counts?

I am attempting to design a reporter construct using a list of significantly differentially expressed genes, but with consideration to the number of Transcripts that are present in in my control samples compared to my test samples. My hope is that this information would identify which genes have low enough expression in the test relative to the control to make a reporter that is highly specific and sensitive for monitoring protein function.

ADD REPLYlink written 10 days ago by dk03190

TPM itself is simple, see Raw counts to TPM in R but you seem to be working on something (at least it sounds that way) not-so-standard so if you seek guidance with that or want to hear opinions on whether your strategy makes sense or not you woul need to explain better what you are actually doing, what the setup is and what kinds of data you have. You also seem to be mixing transcript and gene level counts here, at least you say transcripts, but also talk about differential genes, that is not the same.

ADD REPLYlink modified 10 days ago • written 10 days ago by ATpoint42k

Salmon can output counts and TPMs, and is really fast - it will run a few dozen samples in less than one hour.

TPMs estimated from gene counts are bad estimates, see a good explanation here: DESeq2: Is it possible to convert read counts to expression values via TPM and return these values?.

ADD REPLYlink written 10 days ago by h.mon31k
gravatar for ATpoint
10 days ago by
ATpoint42k wrote:

You can use (log)CPMs from edgeR starting from a count matrix, it is a one-liner, see Basic normalization, batch correction and visualization of RNA-seq data. Don't use any of these naive metrics that only scale by library size, it is typically not sufficient as it fails to correct for library composition. edgeR has a rpkm function though, which is simply its normalized counts divided by gene length, but I would not use this since in the differential testing you do not consider gene length and the MA-plot is actually meant to visualize the DE results, so just use cpm() as described in the lined post.

ADD COMMENTlink modified 10 days ago • written 10 days ago by ATpoint42k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1245 users visited in the last hour