Dear all,
I am trying to convert a genbank file into genome feature format version 3. Using the GFF3 online validator the file I have prepared are OK but there are issues with some of the phases of the CDS. I am summarizing the issues in this table:
entry   phase   start   end     delta   Delta third
1           0   76014   76437   423         141.00
2           2   76532   76794   262          87.33
3           0   76901   77296   395         131.67
4           0   82169   82217   48           16.00
5           2   82326   83644   1,318       439.33
6           0   83960   85309   1,349       449.67
7           1   86174   88442   2,268       756.00
8           0   88544   89010   466         155.33
9           1   89700   90945   1,245       415.00
10          0   91042   91496   454         151.33
entry   phase   Minus 1 Minus 1 third   Minus 2     Minus 2 third
1       0         422   140.67          421         140.33
2       2         261   87.00           260          86.67
3       0         394   131.33          393         131.00
4       0          47   15.67            46          15.33
5       2        1317   439.00         1316         438.67
6       0        1348   449.33         1347         449.00
8       1        2267   755.67         2266         755.33
9       0         465   155.00          464         154.67
10      1        1244   414.67         1243         414.33
11      0         453   151.00          452         150.67
entry is a given CDS feature.
phase is the suggested phase from the validator
start is the start position of the CDS and end its end position
delta is the difference end - start
delta third is delta/3
Minus 1 is delta -1 and Minus 1 third is (delta -1)/3
Minus 2 is delta -2 and Minus 2 third is (delta -2)/3
Taking the entry #1, its length delta is divisible by 3 so it makes sense that the validator accepted a phase of 0. Same thing for feat. 4, 6, 8, 10.
The second entry it is not divisible by 3 directly so it is understandable that the validator has flagged it out. Shortening the length of the feature by 1 nucleotide (Minus 1), the feature is now divisible by 3, thus I expected to change the phase to 1, not 2. Same thing for feat. 5.
The third feature is also not divisible by 3, yet the validator did not flag it. It is divisible by 3 after removing two nucleotides, thus I thought the phase should have been 2.
Features 7 and 9 are divisible by 3, yet are flagged with a phase of 1.
It is pretty confusing for me and I haven't found many tutorials online on the subject.
How do I calculate the phase of the CDS based on the start-end positions?
Thanks
If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
I voted the comment to the answer; anyway, I now upvoted and checked the answer itself. Thanks.
Cheers - votes on answers are supposed to indicate the relevance of these answer to future users, it's a bit more prominent and has reason (not only human narcissism) ;-)