I recommand to sort with the tool "gff3sort", given that with stardard unix sort, lines with the same chromosomes and start positions will be placed randomly.
gff3sort avoid this pitfall.
For example :
# Sort your gtf/gff & bgzip it gff3sort.pl --precise --chr_order natural file.gtf/gff | bgzip > file.gtf/gff.gz; # Create associated index tabix -p gff file.gtf/gff.gz;
gff3sort.pl seems to make sure lines having no "Parent=" attribute comes before those having it, if chrom and start position are the same. I think with unix standard program it should go like this:
$ (grep -v "Parent=" sortme.gtf;grep "Parent=" sortme.gtf)| sort -k1,1 -k4,4n -s
Should'nt we have to be sure that within these two groups the 5th column is sorted as well? If so, we have to expand the command a little bit:
(grep -v "Parent=" sortme.gff|sort -k1,1 -k4,4n -k5,5n;grep "Parent=" sortme.gff|sort -k1,1 -k4,4n -k5,5n)| sort -k1,1 -k4,4n -s
If more speed is required we can use
parallel ::: 'grep -v "Parent=" sortme.gff|sort -k1,1 -k4,4n -k5,5n' 'grep "Parent=" sortme.gff|sort -k1,1 -k4,4n -k5,5n' | sort -k1,1 -k4,4n -s
This blog talks about it: https://zhiganglu.com/post/sort-gff-topologically/
The script to use is
You will have to play with the parameter
-gvo to get back a gtf as output.
One could use the method as explained here . Just referring the same below
wget --no-check-certificate https://raw.github.com/ctokheim/PrimerSeq/master/gtf.py -O gtf.py # get command line script $ python gtf.py -c your_gtf_file.gtf # check if GTF is sorted your_gtf_file.gtf is not correctly sorted. please sort before use. $ python gtf.py -i your_gtf_file.gtf -o your_gtf_file.sorted.gtf # GTF was not sorted, so sort it