I recommand to sort with the tool "gff3sort", given that with stardard unix sort, lines with the same chromosomes and start positions will be placed randomly.
gff3sort avoid this pitfall.
For example :
# Sort your gtf/gff & bgzip it gff3sort.pl --precise --chr_order natural file.gtf/gff | bgzip > file.gtf/gff.gz; # Create associated index tabix -p gff file.gtf/gff.gz;
gff3sort.pl seems to make sure lines having no "Parent=" attribute comes before those having it, if chrom and start position are the same. I think with unix standard program it should go like this:
$ (grep -v "Parent=" sortme.gtf;grep "Parent=" sortme.gtf)| sort -k1,1 -k4,4n -s
Should'nt we have to be sure that within these two groups the 5th column is sorted as well? If so, we have to expand the command a little bit:
(grep -v "Parent=" sortme.gff|sort -k1,1 -k4,4n -k5,5n;grep "Parent=" sortme.gff|sort -k1,1 -k4,4n -k5,5n)| sort -k1,1 -k4,4n -s
If more speed is required we can use
parallel ::: 'grep -v "Parent=" sortme.gff|sort -k1,1 -k4,4n -k5,5n' 'grep "Parent=" sortme.gff|sort -k1,1 -k4,4n -k5,5n' | sort -k1,1 -k4,4n -s