I am looking for a simple algorithm to translate transcript names to gene names. Here are few examples of ways in which transcript seems to be related to the gene name:
Example 1: http://www.wormbase.org/species/c_elegans/transcript/Y74C9A.2.2#042--10 The gene name is Y74C9A.2 but the transcript is Y74C9A.2.2
Example 2: http://www.wormbase.org/species/c_elegans/cds/B0304.1a#04--10 The gene name is B0304.1 and the transcript is B0304.1a
Example 3: http://www.wormbase.org/species/c_elegans/transcript/T07H6.5#042--10 The gene name is T07H6.5 and the transcript name is T07H6.5
If these all the possibilities, then I think I could write a simple script to get the gene name. My problem is I don't have any idea if there are all the possibilities.
You should first read worm gene naming convention. Your first example is probably wrong: Y74C9A.2 looks like the gene name. The it seems that worm genes are named as
/^[A-Z][A-Z0-9]+\.[0-9]+$/
. If you can confirm it from official doc, you have the rule.@lh3: I fixed it.