gff3 header delimiter space or a tab
2
1
Entering edit mode
3.2 years ago
microfuge ★ 1.9k

Dear All,

I could not find a source which states the field delimiter to be used in gff3 header. Can it be a space or a tab or it should be a space only ? My hunch is a space.

##gff-version 3
##sequence-region 1 10


Many Thanks!

gff3 • 1.6k views
2
Entering edit mode

I don't think it even matters.

you could op en it in vi and then do :set list to show all 'invisible' chars ( ^I is tab )

0
Entering edit mode

Thanks so much! This was a fake gff entry I created, just wanted to know if the official specification says something about it. Did not know about the set list option in vi (quite nice :) ).

0
Entering edit mode

Link to "official" (best I've found so far) specifications: https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md

1
Entering edit mode
3.2 years ago
Carambakaracho ★ 3.1k

This is a great question, and one why I 'love' the gff format so much. It is not explicitly defined. Period. See definition of directives in gff3 format - implicitly the documentation uses spaces, just as ATpoint illustrated, so I recommend spaces, too. Tabs are usually used for separation of the feature lines.

2
Entering edit mode

+1 for the space.
As you can see in the snapshots of the different versions of the format I put in the review of the format here: https://github.com/NBISweden/GAAS/blob/master/annotation/CheatSheet/gxf.md they always have used a space.
Let's ask them to clarify it in the repo of the gff3 specification. ✅ => https://github.com/The-Sequence-Ontology/Specifications/issues/23

0
Entering edit mode
3.2 years ago
shoujun.gu ▴ 350

gff3 from gencode is tab.

edit: sorry, I didn't notice the post is talk about the header... Then it just regular sentences I think.

0
Entering edit mode

No, it isn't, it is space, at least in the mouse (v20) files I have on my machine.

gzcat gencode.vM20.annotation.gff3.gz | head
##gff-version 3
#description: evidence-based annotation of the mouse genome (GRCm38), version M20 (Ensembl 95)
#provider: GENCODE
#contact: gencode-help@ebi.ac.uk
#format: gff3
#date: 2018-11-30
##sequence-region chr1 1 195471971

0
Entering edit mode

Yes, I just realize the post is deal with the header only.

0
Entering edit mode

I guess the header line is simply more or less non-standardized at all, but for the actual file, yes it is tab, like in most bioinformatics formats.