We recently noted that the current ENSembl data is full of very short isoforms that likely do not represent functional transcripts BUT lead to over-estimate the diversity of the transcriptome (and may include intron-retention isoforms too).
I am looking for a programatic way to reproduce a subset of ENSembl closer to what is provided in ENCODE but that would let me control the degree of evidence I wish to keep.
I found info in ENSembl about TSL (Transcript Support Level) but do not find TSL exposed in BioMART not examples of [R] biomaRt commands applying this annotation to filtering (# although head(organismAttributes("Homo sapiens"), 20) returns transcript_tsl at position #15)
Getting the full GENCODE build as a download is not what I need here, I want to create my own subsets
Thanks in advance