Is it possible to sort a gff3 on chromosome, position and then featuretype (gene, mRNA, exon, CDS).
The order of the featuretypes is important when converting a gff file to a gtf file with gffread.
If the mRNA record is before the gene record, then the gene_id and gene_name are not set in the last column for the exons and CDS in the gtf file.
Bedtools sort only sort on chromosome and position, not on featuretype.
The goal is to convert any gff3 file (that has genes, mRNA, exon and CDS records) into a GTF file that has exon and CDS records. The exon and CDS records in the GTF should have the transcript_id, gene_id and gene_name set.
When converting from gff3 to gtf via gffread, this requires the gene to be before the mRNA record.
Chr_01 BGI_GENES exon 2972 3128 . - . transcript_id "M0001760.1"; gene_id "G0001760"; gene_name "G0001760";
This because of a downstream requirement for the gene_id and gene_name in the GTF.
Note, depending on the source of your GFF3 file, it is possible that the 'mRNA' feature type is expanded to use terms such as transcript, miRNA, lncRNA, etc. Things can get a bit tricky with such a list. A hacky way of doing this would be to use something like awk to append an alphabet to make sorting easy; something like wgene, xmRNA, yexon and zCDS. Then, you just do an alphasort followed by another step of sed to remove the extra prefix.