GFF3 file format
3
0
Entering edit mode
8.2 years ago
rakeshmbb • 0

Hi everyone. Presently I am working with a GFF3 file. In case of any feature if it is present in minus strand why genomic co-ordinate for start of that feature is lower than the end of the feature. It should be reverse. Is not it? For example if a gene is present in minus strand it should start with a higher coordinate than that of end coordinate.

Please help I am confused. Actually I want to measure intergenic distance between a set of gene for further analysis.

Thank you in advance

sequence • 4.3k views
ADD COMMENT
1
Entering edit mode
8.2 years ago

The start/end coordinates are on the "+" strand, regardless of whether the feature is on the "-" strand or not (i.e., for - strand features, the end is the start and the start is the end). This makes sorting and otherwise handling the files easier.

ADD COMMENT
0
Entering edit mode

Thank you Devon Ryan. I was also thinking it in the same way but I was not sure.

ADD REPLY
1
Entering edit mode
8.2 years ago
Michael 54k

You gff file is correct, by this definition the start coordinate must be less than the end coordinate, all parsing libraries should handle the coordinates and strand correctly. In fact the way it is encoded makes the string extraction using standard functions more efficient to use on sequence data that are always only given in one direction:

## Pseudocode, given start end ordered already
for all feature in ggf.features:
    subseq = substring (chromosome, feature.start, feature.end) 
    # given substring function is 1-based, most to all substring functions work that way
    subseq = reverse.complement(subseq) if feature.strand == "-"
## given start end not particularly ordered but identified by strand
for all feature in ggf.features:
    (start, end) = sort (feature.start, feature.end) 
    ## we save this operation each time we extract a feature
    ## this can be implemented in many ways, but will always result in 1 or 4 redundant
    ## machine register operations:
    ## 1. a > b ?  2.-4.: swap: a=tmp; a=b; b=tmp; 
    subseq = substring (chromosome, start, end) 
    subseq = reverse.complement(subseq) if feature.strand == "-"

Because, extractions are more common than writing a gff file (once vs. every time someone uses the genome file), we will save 3 (with sanity checking, because swap never happens) - 4 (without any sanity check) on sort each time we access a feature. Not saying that this really was the reason, but it might sound convincing.

ADD COMMENT
0
Entering edit mode
8.2 years ago
Thibault D. ▴ 700

Hi rakeshmbb,

It is not written in GFF3 specifications, however most of the GFF3 files are sorted according to ascending position. This order "reverses" the features' order of genes present in the minus strand.

ADD COMMENT

Login before adding your answer.

Traffic: 1679 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6