"gt gff3 -sortlines -tidy -retainids" reduce some lines
2
0
Entering edit mode
6 weeks ago
153348734 • 0

==> cultivar_Lee.repeat.gff3 <==,

chr1    .       Repeat  28      95      .       +       .       Name=LTR/Gypsy;Family=TE_00002776_INT

chr1    .       Repeat  222     267     .       +       .       Name=LTR/Gypsy;Family=TE_00002776_INT

chr1    .       Repeat  287     349     .       +       .       Name=LTR/Gypsy;Family=TE_00002776_INT

chr1    .       Repeat  430     472     .       +       .       Name=LTR/Gypsy;Family=TE_00002776_INT

chr1    .       Repeat  581     626     .       +       .       Name=LTR/Gypsy;Family=TE_00002776_INT

chr1    .       Repeat  646     708     .       +       .       Name=LTR/Gypsy;Family=TE_00002776_INT

chr1    .       Repeat  817     883     .       +       .       Name=LTR/Gypsy;Family=TE_00002776_INT

==> cultivar_Lee.repeat.sorted.gff3 <==

chr1    .       Repeat  2808    2875    .       +       .       Name=LTR/Gypsy;family=TE_00002776_INT

chr1    .       Repeat  2823    2903    .       +       .       Name=LTR/unknown;family=TE_00001651_INT

chr1    .       Repeat  4411    4473    .       +       .       Name=LTR/Gypsy;family=TE_00002776_INT

I use gt gff3 -sortlines -tidy -retainids cultivar_Lee.repeat.gff3 > cultivar_Lee.repeat.sorted.gff3 to sort my gff3 file, but it delete some lines, why?

   428925 cultivar_Lee.repeat.sorted.gff3

  1194664 cultivar_Lee.repeat.gff3

About half of lines has been delete. My gff file was create by myself use R. I want use tabix for my file. Thank you!

gff3 genome • 234 views
ADD COMMENT
2
Entering edit mode
6 weeks ago

I think you changed some more, because 'Family=' is an illegal uppercase attribute, while in your output there's 'family='

There's a good chance that gt gff3 -tidy removes entries with identical IDs/Names/families, and you have Name=LTR/Gypsy;family=TE_00002776_INT many times. One solution is to add your own unique IDs per row.

A workaround is to use standard Linux sort instead:

sort -k 1,1 -k 4,4n cultivar_Lee.repeat.gff3 > cultivar_Lee.repeat.sorted.gff3
ADD COMMENT
0
Entering edit mode

Thank your help, I try it, it's useful to me!

ADD REPLY
1
Entering edit mode
6 weeks ago
Juke34 ★ 6.3k

You might find useful information here https://agat.readthedocs.io/en/latest/topological-sorting-of-gff-features.html

ADD COMMENT

Login before adding your answer.

Traffic: 1706 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6