Filtering transcripts by transcript support level (TSL)
1
1
Entering edit mode
8.4 years ago
jth ▴ 190

Hi,

I have a question on filtering transcripts from Ensemble by transcription support levels (TSL). Currently, I am collecting Ensembl transcripts for three separate purposes:

  1. Calculating nucleotide distributions on exons, introns, and UTRs separately (from canonical transcripts to avoid redundancy).
  2. For a given genomic location, providing an annotation based on location in a gene model.
  3. RNA quantification (I have recently read this: https://cgatoxford.wordpress.com/2015/10/21/improving-kallisto-quantification-accuracy-by-filtering-the-gene-set/)

I am inclined to filter out TSL 4 (the best supporting EST is flagged as suspect) and TSL 5 (no single transcript supports the model structure) for all purposes to provide more accurate distributions, annotations, quantification, etc.

When I filter according to this criteria, 50,672 transcripts (total: 191,632) and 5,401 canonical transcripts (total: 57,387) are eliminated from autosomal chromosomes. Among eliminated transcripts, 22,035 transcripts (~29% of total protein coding transcripts) and 1,941 canonical transcripts (~10% of total protein coding canonical transcripts) are protein coding. Since these numbers are a bit high and may influence especially the first purpose, I became a bit suspicious of this strategy. At this point, the link I have provided shows an interesting result for quantification too, which left me more confused.

So, would you think this type of filtering is appropriate for the given purposes, or is it an over-conservative and/or unnecessary approach?

Thanks!

ensembl genome sequence transcripts • 4.3k views
ADD COMMENT
1
Entering edit mode

Filtering out TSL4 or 5 seems reasonable

ADD REPLY
1
Entering edit mode
7.2 years ago
Vasisht ▴ 190

Filtering out TSL4 and 5 will also filter out genes like JAK1, SMAD4 which have RefSeq and CCDS transcripts but none of the principal isoforms are TSL 1 through 3. It may be better to filter via APPRIS or use an overlap with RefSeq/CCDS.

ADD COMMENT

Login before adding your answer.

Traffic: 2712 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6