I obtained gene counts in order to perform differential expression. Now I want to generate an MA plot comparing fold-change to RPKM. Is there a streamlined way to do this directly from a file containing gene id's and counts, without having to work with the bam file from which the counts were generated? Cheers
It is better to use TPM than FPKM, see Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples (this point has been made repeatedly here, it will probably appear at the Similar posts at the right.
However, for a MA-plot, even TPMs are unnecessary, edgeR (
maPlot) and DESeq2 (
plotMA) have functions to draw MA plots directly from the counts.
You can use (log)CPMs from edgeR starting from a count matrix, it is a one-liner, see Basic normalization, batch correction and visualization of RNA-seq data.
Don't use any of these naive metrics that only scale by library size, it is typically not sufficient as it fails to correct for library composition.
edgeR has a
rpkm function though, which is simply its normalized counts divided by gene length, but I would not use this since in the differential testing you do not consider gene length and the MA-plot is actually meant to visualize the DE results, so just use
cpm() as described in the lined post.