5.0 years ago by
United States
Hi Aurelie,
It's not a silly question at all. TPM (like RPKM / FPKM / RPK, etc.) is a relative abundance measure. It accounts for the relative rate at which reads are sampled from transcripts. Intuitively, it answers the question, if I sampled 1 million transcripts (randomly and in an un-biased manner) from the set of all expressed transcripts, how much of each transcript would I see (in expectation). As such, TPM is not designed to normalize for library size. If, for example, I doubled the number of reads deriving from each transcript, the TPMs would not change. This is by design, as the TPM factors out the actual number of mapped reads in the experiment to derive a rate. In this sense, the TPMs are normalized.
However, this is almost not certainly what you mean when you ask for normalization by library size. With the exception of limma / voom, every tool I know of for differential expression will use the counts (or estimated counts) of reads mapping to each transcript in each condition. Given this "raw" input, different tools will then apply different normalization techniques to explicitly account for the fact that different samples may have a different number of reads or a different number of mapped reads. Thus, to use sailfish with other tools for downstream differential expression, you can make use of the "NumReads" field (the last column in the quant.sf files). For specific tips (and a useful software package) in using Sailfish with downstream differential expression tools, I recommend you take a look at this paper.
•
link
modified 12 months ago
by
_r_am ♦ 32k
•
written
5.0 years ago by
Rob ♦ 4.6k