Question: What are some references of works that use a TPM expression threshold for filtering samples/genes?
0
gravatar for mike-zx
6 weeks ago by
mike-zx190
mike-zx190 wrote:

I'm struggling to find papers that use transcripts per million (TPMs) on their pre-processing steps for filtering out non-expressed genes or very low expression genes. I'm aware that filtering is usually recommended with raw read counts as they provide more information to work with for the decision, however sometimes it is not possible to work with the raw read counts. I'm interested more than anything on what authors consider expressed (say TPMs of at least 1 or TPMs of at least 5) and what authors would consider a low expressed gene (say x percent of TPMs for a gene across samples don't meet the expression criteria). I know that the heuristic concept of TPM = 5 is roughly 1 transcript in a cell at any given time exists, but I haven't seen this mentioned in any citable works.

So far I've managed to find this article which investigates tibial nerve samples available in the GTEX project. They filter out genes with median TPM lesser than 0.5 or with max TPM lesser than 1 across samples. The GTEX project is a good example of a situation where you would want to filter by TPM since they already performed high quality processing of raw read counts and researchers may pickup the TPMs from the start. Does anyone know more papers in which filtering is established directly over the TPM counts?

rna-seq • 163 views
ADD COMMENTlink modified 29 days ago by Biostar ♦♦ 20 • written 6 weeks ago by mike-zx190
1

If I recall correctly a major output of Kallisto is the TPM metric. Maybe try looking at papers that use Kallisto

ADD REPLYlink written 6 weeks ago by curious250
1

Related discussion: TPM values of expressed genes

ADD REPLYlink written 29 days ago by igor9.8k
1
gravatar for dsull
29 days ago by
dsull1.0k
UCLA
dsull1.0k wrote:

In my opinion, it's impossible to reliably determine what exactly may be non-expressed or lowly-expressed without further experiments (e.g. those including spike-ins). We only have heuristics -- which are probably extrapolated from previously published works/experiments (oftentimes without citation), but there is no golden rule and I can't think of any studies that actually reliably validate these heuristics.

See the following blog post (from the kallisto author) for a discussion where the concept of using such thresholds might have arisen: https://liorpachter.wordpress.com/2014/04/30/estimating-number-of-transcripts-from-rna-seq-measurements-and-why-i-believe-in-paywall/

ADD COMMENTlink modified 29 days ago • written 29 days ago by dsull1.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1267 users visited in the last hour