Tool: AGAT - Another Gff Analysis Toolkit
4
gravatar for Juke-34
9 weeks ago by
Juke-343.4k
Sweden
Juke-343.4k wrote:

AGAT - Another Gff Analysis Toolkit
Suite of tools to handle gene annotations in any GTF/GFF format. Available through conda for an easy install.

Why AGAT?

  • The main idea was first to be able to parse all possible cases that can be met (I listed more than 30 cases). To my knowledge AGAT is the only one able to handle all of them. How? By parsing in three ways concomitantly with different priority: i) using parent/child relationship; ii) using a common tag to group features together (an attribute from the 9th column sharing same "locus" value); iii) using sequential approach (e.g. all exon are attach to the last gene met if none of the two first approach have worked).

  • The second idea was to be able to create a full standardised GFF3 file that could actually fit in any tool. Once again AGAT is the only one recreating fully the missing information:

    • missing features (gene, mRNA, tRNA, exon, UTRs, etc...)
    • missing attributes (ID, Parent)

    and fixing wrong information:

    • identifier to be uniq.
    • feature location (e.g mRNA will be stretched if shorter than its exons).
    • remove duplicated features.
    • merge overlapping loci (if option activate because for prokaryote is not something we would like)

  • The third idea was to have a correct topological sorting output. To my knowledge AGAT is the only one dealing properly with this task. More information about it here.

  • Finally, based on the abilities described previously I have developed a toolkit to perform different tasks. Some are originals, some are similars than what other tools might offer, but within AGAT they have the strength of the 3 first points.

Few examples among the >50 tools available:

  • check, fix, pad missing information into sorted and standardised: gff3 agat_sp_gxf_to_gff3.pl
  • make statistics:agat_sp_statistics.pl
  • extract any type of sequence: agat_sp_extract_sequences.pl
  • complement annotations (non-overlapping loci): agat_sp_complement_annotations.pl
  • merge annotations: agat_sp_merge_annotations.pl
  • filter gene models by ORF size: agat_sp_filter_by_ORF_size.pl
  • filter to keep only longest isoforms: agat_sp_keep_longest_isoform.pl
  • create introns features: agat_sp_add_introns.pl
  • fix cds phases: agat_sp_fix_cds_phases.pl
  • extract attributes: agat_sp_extract_attributes.pl
  • manage IDs: agat_sp_manage_IDs.pl
  • convert into tabulated format: agat_sp_to_tabulated.pl
  • specificity sensitivity: agat_sp_sensitivity_specificity.pl
  • fusion / split analysis between two annotations: agat_sp_compare_two_annotations.pl
  • analyze differences between BUSCO results: agat_sp_compare_two_BUSCOs.pl
tool toolkit gff annotation gtf • 339 views
ADD COMMENTlink modified 7 weeks ago • written 9 weeks ago by Juke-343.4k
1

You should follow the more modern development model where tools work via subcommands rather than polluting the namespace with hard to discover script names.

instead of:

agat_sp_fix_cds_phases.pl

it should be:

agat fixcds

running:

agat

should produce a list of commands with short descriptions for each.

The approach was first introduced to bioinformatics by bwa then adopted by bedtools and other frameworks.

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by Istvan Albert ♦♦ 82k

Thank you for the feedback, your are right! Colleagues told me the same but I have been lazy and didn't implement it (yet). It something I should do for version 1.0.0... if I find time to work on it :)

ADD REPLYlink written 7 weeks ago by Juke-343.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1472 users visited in the last hour