Question: Can't sort gff file ordered by chromosome (ch1,chr2,chr3......chrX).
0
gravatar for unique379
4.3 years ago by
unique37980
Spain
unique37980 wrote:

Dear all,

I am trying to sort my gff file as ascending order by chromosome (chr1, chr2. chr3.....chrX) but not able to succeeded. Neither sortBed nor unix sort produce a karyotype order (chr1, chr2, ... chr10, chr11, chrM, chrX). However, I found one possible solution by sort -V -k1,1 (this works fine in my another system (centOS; sort version: 8.22) ) but unfortunately my main system (RedHat; sort version:5.97) sort do not have option -V. Any possible alternative ???

Note: please keep in mind that i m not sorting my gff file as typical bed file (sort -k 1,1 -k2,2n) as this is not typical bed file.

My input.gff looks like:

chr1    .    miRNA_primary_transcript    451141    451218    .    +    .
chr1    .    miRNA_primary_transcript    1275348    1275428    .    +    .
chr1    .    miRNA_primary_transcript    2806071    2806208    .    +    .
chr10    .    miRNA_primary_transcript    4333896    4333977    .    -    .
chr10    .    miRNA_primary_transcript    10295360    10295450    .
chr10    .    miRNA_primary_transcript    15983153    15983233    .
chr11    .    miRNA_primary_transcript    2162553    2162662    .    -    .
chr11    .    miRNA_primary_transcript    3157038    3157122    .    +    .
chr2    .    miRNA_primary_transcript    59942577    59942660    .
chr20    .    miRNA_primary_transcript    5116644    5116774    .    +    .
chr25    .    miRNA_primary_transcript    35855072    35855176    .
chr3    .    miRNA_primary_transcript    13208734    13208831    .
......

Total number of chromosome 25.

bash rna-seq next-gen • 2.5k views
ADD COMMENTlink modified 20 months ago by RamRS27k • written 4.3 years ago by unique37980

see How To Sort Bed Format File

ADD REPLYlink modified 20 months ago by RamRS27k • written 4.3 years ago by Pierre Lindenbaum128k

Those answers won't work for GFF; as is because you have declarations and comments in the header, and most GFF files will have comment lines separating each feature. You could could skip all those lines, but that would create something that would not be a usable GFF for most purposes.

ADD REPLYlink modified 20 months ago by RamRS27k • written 4.3 years ago by SES8.3k
1
gravatar for SES
4.3 years ago by
SES8.3k
Vancouver, BC
SES8.3k wrote:

I would use (from GenomeTools):

gt gff3 -sort file.gff3 > file_sort.gff3

to sort by coordinates. That may not put the chromosomes in numerical order, but it will be a correctly sorted file by coordinate. If you really want them in numerical order (I can't think why this would be necessary) then you can do this with a script or someone like Pierre can probably do some sort tricks to get the headers and everything in the right order. I would guess that the different chromosome naming schemes are why having them in a certain order is not a requirement, though have the features sorted by coordinate is important.

edit: I see you have identifiers like "ChrM" and "ChrX" in your data. How should they be ordered with respect to the other chromosomes? You'll definitely have to write a script if you have something specific in mind, or come up with shell command to order them for you (though I suspect that may get complex).

ADD COMMENTlink modified 20 months ago by RamRS27k • written 4.3 years ago by SES8.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1178 users visited in the last hour