Question: Filtering transcripts by transcript support level (TSL)
1
gravatar for jth
3.9 years ago by
jth200
Turkey
jth200 wrote:

Hi,

I have a question on filtering transcripts from Ensemble by transcription support levels (TSL). Currently, I am collecting Ensembl transcripts for three separate purposes:

  1. Calculating nucleotide distributions on exons, introns, and UTRs separately (from canonical transcripts to avoid redundancy).
  2. For a given genomic location, providing an annotation based on location in a gene model.
  3. RNA quantification (I have recently read this: https://cgatoxford.wordpress.com/2015/10/21/improving-kallisto-quantification-accuracy-by-filtering-the-gene-set/)

I am inclined to filter out TSL 4 (the best supporting EST is flagged as suspect) and TSL 5 (no single transcript supports the model structure) for all purposes to provide more accurate distributions, annotations, quantification, etc. 

When I filter according to this criteria, 50,672 transcripts (total: 191,632) and 5,401 canonical transcripts (total: 57,387) are eliminated from autosomal chromosomes. Among eliminated transcripts, 22,035 transcripts (~29% of total protein coding transcripts) and 1,941 canonical transcripts (~10% of total protein coding canonical transcripts) are protein coding. Since these numbers are a bit high and may influence especially the first purpose, I became a bit suspicious of this strategy. At this point, the link I have provided shows an interesting result for quantification too, which left me more confused. 

So, would you think this type of filtering is appropriate for the given purposes, or is it an over-conservative and/or unnecessary approach?

Thanks!

 

ADD COMMENTlink modified 2.7 years ago by Vasisht170 • written 3.9 years ago by jth200
1

Filtering out TSL4 or 5 seems reasonable  

ADD REPLYlink written 3.8 years ago by Rm7.9k
1
gravatar for Vasisht
2.7 years ago by
Vasisht170
Vasisht170 wrote:

Filtering out TSL4 and 5 will also filter out genes like JAK1, SMAD4 which have RefSeq and CCDS transcripts but none of the principal isoforms are TSL 1 through 3. It may be better to filter via APPRIS or use an overlap with RefSeq/CCDS.

ADD COMMENTlink written 2.7 years ago by Vasisht170
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2751 users visited in the last hour