We recently noted that the current ENSembl data is full of very short isoforms that likely do not represent functional transcripts BUT lead to over-estimate the diversity of the transcriptome (and may include intron-retention isoforms too).
I am looking for a programatic way to reproduce a subset of ENSembl closer to what is provided in ENCODE but that would let me control the degree of evidence I wish to keep.
I found info in ENSembl about TSL (Transcript Support Level) but do not find TSL exposed in BioMART not examples of [R] biomaRt commands applying this annotation to filtering (# although head(organismAttributes("Homo sapiens"), 20) returns transcript_tsl at position #15)
Does anybody have code biomaRt examples to fetch only human transcripts with annotation 'GENCODE basic' as exemplified in the screen-shot of the ABLIM1 gene?
Getting the full GENCODE build as a download is not what I need here, I want to create my own subsets
Thanks in advance
Stephane
Thanks for this Emily,
Can you please be more explicit, I do not see GENCODE nor TSL in the drop down under gene in BioMART.
Also, what about my specific request to do this programatically using biomaRt in R?
Best Stephane
Did you try scrolling down the page?
SHAME on ME :-) is t is clearly Friday.
One last, what about doing this in R?