Question: Gff3 Coordinate: Find Stop Codon On - Strand
2
5.7 years ago by
Rvosa570
Leiden, the Netherlands
Rvosa570 wrote:

Given the following GFF3, where is the stop codon supposed to be:

``````scaffold1.1     maker   gene    247127  258737  .       -       .       ID=...
scaffold1.1     maker   CDS     258659  258737  .       -       1       ID=...
scaffold1.1     maker   CDS     254856  254976  .       -       2       ID=...
scaffold1.1     maker   CDS     251358  251395  .       -       1       ID=...
scaffold1.1     maker   CDS     250084  250198  .       -       2       ID=...
scaffold1.1     maker   CDS     248687  248760  .       -       1       ID=...
scaffold1.1     maker   CDS     247127  247239  .       -       0       ID=...
``````

My reasoning so far has been:

• the last CDS is the one at 247127..247239 on the minus strand
• the because we are reading from right to left, the stop codon is at 247127..247130
• also because we are on the minus strand, we need to reverse complement 247127..247130
• the coordinates are 1-based, so I need to subtract 1 for each coordinate for any language that has 0-based indexes

Here's my confusion:

• at 247127..247130 the sequence is GAT, so it's a reverse (but not complemented) stop codon. Is that right?
• am I supposed to do something with the phase values?
gff3 codon coordinates strand • 2.1k views
modified 5.7 years ago by Istvan Albert ♦♦ 81k • written 5.7 years ago by Rvosa570

Isn't the sequence denoted by 247127..247130 of length 4, not 3?

Indeed it is, apologies. See how these coordinates are driving me crazy? Harumph. I meant to say 247127.. 247129

0
5.7 years ago by
Istvan Albert ♦♦ 81k
University Park, USA
Istvan Albert ♦♦ 81k wrote:

Your reasoning is correct and the last codon should be the stop codon if the sequence is reverse complemented.

Also note that the last three bases will be 247127, 28 and 29 and you should not include 30!

The phase indicates how many bases of the current CDS will complete the codon that started in the previous CDS. It does not affect the stop codon.

Thank you very much for your reply, this is the first time where the 'phase' thing is starting to make sense. So is it then the case that, if we have only two CDSs in the same gene, then the phase of cds2 is going to be length(cds1) % 3?

ETA: if that's how it works then I can also see that the phase can't affect the stop codon, because for the stop codon we're just counting "backwards" from the last position.