Can't sort gff file ordered by chromosome (ch1,chr2,chr3......chrX).
Entering edit mode
8.3 years ago
unique379 ▴ 110

Dear all,

I am trying to sort my gff file as ascending order by chromosome (chr1, chr2. chr3.....chrX) but not able to succeeded. Neither sortBed nor unix sort produce a karyotype order (chr1, chr2, ... chr10, chr11, chrM, chrX). However, I found one possible solution by sort -V -k1,1 (this works fine in my another system (centOS; sort version: 8.22) ) but unfortunately my main system (RedHat; sort version:5.97) sort do not have option -V. Any possible alternative ???

Note: please keep in mind that i m not sorting my gff file as typical bed file (sort -k 1,1 -k2,2n) as this is not typical bed file.

My input.gff looks like:

chr1    .    miRNA_primary_transcript    451141    451218    .    +    .
chr1    .    miRNA_primary_transcript    1275348    1275428    .    +    .
chr1    .    miRNA_primary_transcript    2806071    2806208    .    +    .
chr10    .    miRNA_primary_transcript    4333896    4333977    .    -    .
chr10    .    miRNA_primary_transcript    10295360    10295450    .
chr10    .    miRNA_primary_transcript    15983153    15983233    .
chr11    .    miRNA_primary_transcript    2162553    2162662    .    -    .
chr11    .    miRNA_primary_transcript    3157038    3157122    .    +    .
chr2    .    miRNA_primary_transcript    59942577    59942660    .
chr20    .    miRNA_primary_transcript    5116644    5116774    .    +    .
chr25    .    miRNA_primary_transcript    35855072    35855176    .
chr3    .    miRNA_primary_transcript    13208734    13208831    .

Total number of chromosome 25.

bash RNA-Seq next-gen • 4.1k views
Entering edit mode
Entering edit mode

Those answers won't work for GFF; as is because you have declarations and comments in the header, and most GFF files will have comment lines separating each feature. You could could skip all those lines, but that would create something that would not be a usable GFF for most purposes.

Entering edit mode
8.3 years ago
SES 8.6k

I would use (from GenomeTools):

gt gff3 -sort file.gff3 > file_sort.gff3

to sort by coordinates. That may not put the chromosomes in numerical order, but it will be a correctly sorted file by coordinate. If you really want them in numerical order (I can't think why this would be necessary) then you can do this with a script or someone like Pierre can probably do some sort tricks to get the headers and everything in the right order. I would guess that the different chromosome naming schemes are why having them in a certain order is not a requirement, though have the features sorted by coordinate is important.

edit: I see you have identifiers like "ChrM" and "ChrX" in your data. How should they be ordered with respect to the other chromosomes? You'll definitely have to write a script if you have something specific in mind, or come up with shell command to order them for you (though I suspect that may get complex).


Login before adding your answer.

Traffic: 1793 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6