AGAT - Another Gff/Gtf Analysis Toolkit
Suite of tools to handle gene annotations in any GTF/GFF format. Available through conda and Docker for an easy install/usage.
- The main idea was first to be able to parse all possible GTF/GFF versions along with all possible underlying flavors that can be met. (I listed more than 30 cases).
To my knowledge AGAT is the only one able to handle all of them. How? By parsing in three ways concomitantly with different priority:
- i) using parent/child relationship
- ii) using a common tag to group features together (an attribute from the 9th column sharing same "locus" value)
- iii) using sequential approach (e.g. all exon are attach to the last gene met if none of the two first approach have worked)
The second idea was to be able to create a full standardised GFF3 file that could actually fit in any tool. AGAT exels compared to many tools in creating the missing information:
- missing features (gene, mRNA, tRNA, exon, UTRs, etc...)
- missing attributes (ID, Parent)
and fixing wrong information:
- identifier to be unique.
- feature location (e.g mRNA will be stretched if shorter than its exons).
- remove duplicated features.
- group related features (if spread in different places in the file).
- sort features.
- merge overlapping loci (if option activate because for prokaryote is not something we would like)
- The third idea was to have a correct topological sorting output. To my knowledge AGAT is the only one dealing properly with this task. More information about it here.
- Finally, based on the abilities described previously I have developed a toolkit to perform different tasks. Some are originals, some are similars than what other tools might offer, but within AGAT they have the strength of the 3 first points.
Few examples among the >50 tools available:
- check, fix, pad missing information into sorted and standardised:
- make statistics:
- extract any type of sequence:
- complement annotations (non-overlapping loci):
- merge annotations:
- filter gene models by ORF size:
- filter to keep only longest isoforms:
- create introns features:
- fix cds phases:
- extract attributes:
- manage IDs:
- convert into tabulated format:
- specificity sensitivity:
- fusion / split analysis between two annotations:
- analyze differences between BUSCO results: