I'm working with RNA-seq data from S. cerevisiae. I checked the annotation files from SGD and Ensemble (SGD provides the annotations to Ensembl, aren't the same files because SGD adds the sequences at the bottom of the gtf/gff), and to my surprise, there aren't any gene isoforms/splice variantes defined in these annotations files. I've checked the last release from ensembl and SGD to be sure. This is a bit surprising because there is literature describing multiple gene isoforms for this organism. Anyone with experience working with S. cerevisiae can confirm this ?
I've tried multiple annotations files, the latest are :
"Saccharomyces_cerevisiae.R64-1-1.96.gff3.gz"
"Saccharomyces_cerevisiae.R64-1-1.96.gtf.gz"
Which are basically the same.
I have some simple code written in python
where I take every transcript/mRNA line from the annotation file and ask for their "parent gene" , counting how many times I've seen this parent gene, so different mRNAs lines that have the same parent gene are alternative isoforms of the same gene.
children_transcript_count("Saccharomyces_cerevisiae.R64-1-1.96.gff3.gz", r'parent=gene:(\w+);')
YDL105W 1
YNL128W 1
YJR033C 1
YER037W 1
YMR173W 1 ...
Trying with other annotations I got this:
children_transcript_count("Caenorhabditis_elegans.WBcel235.96.gff3.gz", r'parent=gene:(\w+);')
WBGene00001340 83
WBGene00006439 57
WBGene00004161 51
WBGene00006779 36
WBGene00006784 33
WBGene00001184 32
WBGene00016269 32
And I've used this code before with other annotations (the search is case insensitive too). It just unexpected to me that S. cerevisiae doesn't have any splice variant defined.
Hey Jean, thanks for your answer.
I saw that, it's very new so i didn't expect it to be a part of the "official" annotation, but there other papers from 15 years ago describing alternative isoforms, they aren't high throughput but still there are plenty of well described examples. A paper from 2015, reports that about 4% of the genes have isoforms, the annotation files were updated in 2018 and have 0 according to my code. I expected at least a few genes with a couple of isoforms and I don't have anyone around here to confirm this. My surprise is also due to we are talking about a model organism so I expected an updated and "high quality" annotation file.
As you've found, yeast doesn't appear to have much alternative splicing compared to other eukaryotes. Also, yeast being primarily a model organism for genetics, I suspect that there's been little interest in isoforms.
By the way, use the 'add comment' button to reply to a post.