How to sort gff3 according to chromosome order?
1
0
Entering edit mode
13 months ago
BioinfoBee • 0

Hello, Curious to know on how to sort the gff3 file according to its chromosome while keeping its parent (gene) and child features (mRNA, cds and exon) intact:

input example:

Chr6    EVM gene    212579245   212580018   .   +   .   ID=evm.TU.Chr6.3631;Name=EVM prediction Chr6.3631   
Chr6    EVM mRNA    212579245   212580018   .   +   .   ID=evm.model.Chr6.3631;Parent=evm.TU.Chr6.3631;Name=EVM prediction Chr6.3631
Chr6    EVM exon    212579245   212580018   .   +   .   ID=evm.model.Chr6.3631.exon1;Parent=evm.model.Chr6.3631
Chr6    EVM CDS 212579245   212580018   .   +   0   ID=cds.evm.model.Chr6.3631;Parent=evm.model.Chr6.3631
Chr5    EVM gene    240103107   240104618   .   +   .   ID=evm.TU.Chr5.3135;Name=EVM prediction Chr5.3135   
Chr5    EVM mRNA    240103107   240104618   .   +   .   ID=evm.model.Chr5.3135;Parent=evm.TU.Chr5.3135;Name=EVM prediction Chr5.3135
Chr5    EVM exon    240103107   240104618   .   +   .   ID=evm.model.Chr5.3135.exon1;Parent=evm.model.Chr5.3135
Chr5    EVM CDS 240103107   240104618   .   +   0   ID=cds.evm.model.Chr5.3135;Parent=evm.model.Chr5.3135
Chr3    EVM gene    3535391 3537315 .   -   .   ID=evm.TU.Chr3.57;Name=EVM prediction Chr3.57   
Chr3    EVM mRNA    3535391 3537315 .   -   .   ID=evm.model.Chr3.57;Parent=evm.TU.Chr3.57;Name=EVM prediction Chr3.57
Chr3    EVM exon    3535391 3535825 .   -   .   ID=evm.model.Chr3.57.exon3;Parent=evm.model.Chr3.57
Chr3    EVM exon    3535934 3536077 .   -   .   ID=evm.model.Chr3.57.exon2;Parent=evm.model.Chr3.57
Chr3    EVM exon    3536230 3537315 .   -   .   ID=evm.model.Chr3.57.exon1;Parent=evm.model.Chr3.57
Chr3    EVM CDS 3535391 3535825 .   -   0   ID=cds.evm.model.Chr3.57;Parent=evm.model.Chr3.57
Chr3    EVM CDS 3535934 3536077 .   -   0   ID=cds.evm.model.Chr3.57;Parent=evm.model.Chr3.57
Chr3    EVM CDS 3536230 3537315 .   -   0   ID=cds.evm.model.Chr3.57;Parent=evm.model.Chr3.57

expected output example:

Chr3    EVM gene    3535391 3537315 .   -   .   ID=evm.TU.Chr3.57;Name=EVM prediction Chr3.57   
Chr3    EVM mRNA    3535391 3537315 .   -   .   ID=evm.model.Chr3.57;Parent=evm.TU.Chr3.57;Name=EVM prediction Chr3.57
Chr3    EVM exon    3535391 3535825 .   -   .   ID=evm.model.Chr3.57.exon3;Parent=evm.model.Chr3.57
Chr3    EVM exon    3535934 3536077 .   -   .   ID=evm.model.Chr3.57.exon2;Parent=evm.model.Chr3.57
Chr3    EVM exon    3536230 3537315 .   -   .   ID=evm.model.Chr3.57.exon1;Parent=evm.model.Chr3.57
Chr3    EVM CDS 3535391 3535825 .   -   0   ID=cds.evm.model.Chr3.57;Parent=evm.model.Chr3.57
Chr3    EVM CDS 3535934 3536077 .   -   0   ID=cds.evm.model.Chr3.57;Parent=evm.model.Chr3.57
Chr3    EVM CDS 3536230 3537315 .   -   0   ID=cds.evm.model.Chr3.57;Parent=evm.model.Chr3.57
Chr5    EVM gene    240103107   240104618   .   +   .   ID=evm.TU.Chr5.3135;Name=EVM prediction Chr5.3135   
Chr5    EVM mRNA    240103107   240104618   .   +   .   ID=evm.model.Chr5.3135;Parent=evm.TU.Chr5.3135;Name=EVM prediction Chr5.3135
Chr5    EVM exon    240103107   240104618   .   +   .   ID=evm.model.Chr5.3135.exon1;Parent=evm.model.Chr5.3135
Chr5    EVM CDS 240103107   240104618   .   +   0   ID=cds.evm.model.Chr5.3135;Parent=evm.model.Chr5.3135
Chr6    EVM gene    212579245   212580018   .   +   .   ID=evm.TU.Chr6.3631;Name=EVM prediction Chr6.3631   
Chr6    EVM mRNA    212579245   212580018   .   +   .   ID=evm.model.Chr6.3631;Parent=evm.TU.Chr6.3631;Name=EVM prediction Chr6.3631
Chr6    EVM exon    212579245   212580018   .   +   .   ID=evm.model.Chr6.3631.exon1;Parent=evm.model.Chr6.3631
Chr6    EVM CDS 212579245   212580018   .   +   0   ID=cds.evm.model.Chr6.3631;Parent=evm.model.Chr6.3631

Regards, B

gff3 sort • 1.2k views
ADD COMMENT
0
Entering edit mode

uh ? what about the sort command ?

ADD REPLY
0
Entering edit mode

it does sort but not able to keep the child features (mRNA, CDS, exon) in proper order. For example, I can sort using: sort -k1,1 -k4,4n -k5,5n input.gff3 > output_sorted.gff3 but the output would look something like below:

Chr3    EVM exon    3535391 3535825 .   -   .   ID=evm.model.Chr3.57.exon3;Parent=evm.model.Chr3.57
Chr3    EVM gene    3535391 3537315 .   -   .   ID=evm.TU.Chr3.57;Name=EVM prediction Chr3.57   
Chr3    EVM mRNA    3535391 3537315 .   -   .   ID=evm.model.Chr3.57;Parent=evm.TU.Chr3.57;Name=EVM prediction Chr3.57
Chr3    EVM exon    3535934 3536077 .   -   .   ID=evm.model.Chr3.57.exon2;Parent=evm.model.Chr3.57
Chr3    EVM exon    3536230 3537315 .   -   .   ID=evm.model.Chr3.57.exon1;Parent=evm.model.Chr3.57
Chr3    EVM CDS 3535391 3535825 .   -   0   ID=cds.evm.model.Chr3.57;Parent=evm.model.Chr3.57
Chr3    EVM CDS 3535934 3536077 .   -   0   ID=cds.evm.model.Chr3.57;Parent=evm.model.Chr3.57
Chr3    EVM CDS 3536230 3537315 .   -   0   ID=cds.evm.model.Chr3.57;Parent=evm.model.Chr3.57
Chr5    EVM exon    240103107   240104618   .   +   .   ID=evm.model.Chr5.3135.exon1;Parent=evm.model.Chr5.3135
Chr5    EVM gene    240103107   240104618   .   +   .   ID=evm.TU.Chr5.3135;Name=EVM prediction Chr5.3135   
Chr5    EVM mRNA    240103107   240104618   .   +   .   ID=evm.model.Chr5.3135;Parent=evm.TU.Chr5.3135;Name=EVM prediction Chr5.3135
Chr5    EVM CDS 240103107   240104618   .   +   0   ID=cds.evm.model.Chr5.3135;Parent=evm.model.Chr5.3135
Chr6    EVM exon    212579245   212580018   .   +   .   ID=evm.model.Chr6.3631.exon1;Parent=evm.model.Chr6.3631
Chr6    EVM gene    212579245   212580018   .   +   .   ID=evm.TU.Chr6.3631;Name=EVM prediction Chr6.3631   
Chr6    EVM mRNA    212579245   212580018   .   +   .   ID=evm.model.Chr6.3631;Parent=evm.TU.Chr6.3631;Name=EVM prediction Chr6.3631
Chr6    EVM CDS 212579245   212580018   .   +   0   ID=cds.evm.model.Chr6.3631;Parent=evm.model.Chr6.3631
ADD REPLY
0
Entering edit mode

You're sorting by chromosome AND co-ordinates. Try just sort -k1,1

ADD REPLY
0
Entering edit mode

using sort -k1,1 doesn't keep the order of child features of each gene in output.

ADD REPLY
1
Entering edit mode
   -s, --stable
          stabilize sort by disabling last-resort comparison
ADD REPLY
0
Entering edit mode

Those are your options with the sort utility - you can either keep the existing order or re-order by coordinate. If you're looking for a way to address any possible sorting problems, try gff3sort. You could use --precise to get to the recommended way entries are supposed to be sorted. If that works, I''ll add this as an answer and you can accept it.

ADD REPLY
3
Entering edit mode
ADD COMMENT

Login before adding your answer.

Traffic: 1946 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6