Ensembl and Gene Prediction Tools output CDS' not divisible by 3
2
0
Entering edit mode
4.1 years ago
NickJD • 0

Ensembl and Gene Prediction Tools output CDS' not divisible by 3

I have been looking at a number of different ORF tools such as prodigal and GFF files from databases such as Ensembl and they both report genes/CDS' which are not divisible by 3. Examples below:

Chromosome  Prodigal_v2.6.2 CDS 686 1828    131.5   +   0   ID=1_1;partial=00
Chromosome  ena CDS 686 1828    .   +   0   ID=CDS:AAC71217

Are we supposed to count one end of the CDS differently from another?

1828 - 686 is 1142 1142 modular 3 is 2

Is there something I am not understanding?

Many thanks.

GFF CDS ORF • 1.3k views
ADD COMMENT
2
Entering edit mode
4.1 years ago
Juke34 8.5k

GFF is one-based coordinate. See here Cheat Sheet For One-Based Vs Zero-Based Coordinate Systems. So it is 1828 - 686 +1 = 1143
1143 / 3 = 381

ADD COMMENT
0
Entering edit mode

Many thanks for the answers. While I guessed it was something like this, I could not find any information in the GFF database providers or ORF prediction tools which state what type of system is being use for any particular file/data. I see it is noted in the link that GFF files are 1-based. Is this true for all GFF files and therefore all ORF predictors 'SHOULD' conform to this?

Thanks again.

Note: The website gave an error when I tried to submit a general comment so I responded to the first answer.

ADD REPLY
1
Entering edit mode

GFF is by definition 1-based, but you have no guarantee that every submitter follows this. If you search long-enough you for sure will find 0-based GFF, bioinformatics is a mess after all :)

ADD REPLY
1
Entering edit mode

I agree but I would say this is quite unlikely. As I show here 1-based system is one of the rare thing that was well defined since the beginning of the format in 1997 i.e Integers. <start> must be less than or equal to <end>. Sequence numbering starts at 1, so these numbers should be between 1 and the length of the relevant sequence, inclusive.

ADD REPLY
1
Entering edit mode
4.1 years ago
ATpoint 82k

Probably this is a 1-based file so you will need to add +1 to the result which makes it 1143/3=381. GFF files are in fact 1-based.

Cheat Sheet For One-Based Vs Zero-Based Coordinate Systems

ADD COMMENT

Login before adding your answer.

Traffic: 3327 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6