I ran gff3_sp_complement_annotations.pl --ref stringtie.gff3 --add augustus.hints_utr.gff3 --out gaas.gff3
and got the below new IDs:
NbV1Ch01 transdecoder five_prime_UTR 99705 99705 . - . ID=nbis_NEW-five_prime_utr-4254;Parent=STRG2.t1
NbV1Ch01 transdecoder three_prime_UTR 98185 98185 . - . ID=nbis_NEW-three_prime_utr-2120;Parent=STRG2.t1
...
NbV1Ch01 AUGUSTUS start_codon 112448 112450 . - 0 ID=start_codon-70639;Parent=g65212.t1
NbV1Ch01 AUGUSTUS stop_codon 109839 109841 . - 0 ID=stop_codon-70662;Parent=g65212.t1
Additionally, I got the following massage:
**********************************************************************************************************************************************************
* Primary tag values (3rd column) not expected => transcription_start_site transcription_end_site *
* Those primary tag are not yet taken into account ! *
* If you wish to use it/them, pleast update the parameter feature json files accordingly (features_level1, features_level2 or features_level3). *
* To resume: *
* - it must be a level1 feature if it has no parent. *
* - it must be a level2 feature if it has a parent and this parent is from level1. *
* - it must be a level3 feature if it has a parent and this parent has also a parent. *
* *
* Currently the tool just ignore them, So if they where Level1,level2, a gene or RNA feature will be created accordingly. *
**********************************************************************************************************************************************************
**********************************************************************************************************************************************************
* Primary tag values (3rd column) not expected => transcription_start_site transcription_start_site transcription_start_site transcription_start_site tr *
* Those values are not compatible with gff3 format and the tool cannot guess to which one they correspond to. *
* If you want to follow rigourously the gff3 format, please visit this website: *
* https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md *
* They provide tools to check the gff3 format. *
* Even if you have this warning, you should be able to use the gff3 output in most of gff3 tools. *
**********************************************************************************************************************************************************
- How GAAS find the new
start_codon
andend_codon
? - What advantages or disadvantages it has to enable
transcription_start_site
andtranscription_end_site
? - How is it possible to enable it?
Thank you in advance,
UPDATE
I ran agat_sp_complement_annotations.pl --ref transdecoder.genome.Fix.gff3 --add augustus.hints_utr.gff3 --out augustus.hints_utrAGAT.gff3
. This is the screenshot of an area where I would like keep only NBlab03G03860.1
and not NBlab03G03870.1
, NBlab03G03880.1
and NBlab03G03890.1
Please find here the genes in question as GFF3.
How is it possible to remove the small the 3 small ones?
Thank you in advance
Please use
agat_sp_complement_annotations.pl
from AGAT instead. It contains some improvements.Thank you, I used it.
Thank you, I noticed that
agat_sp_complement_annotations.pl
did not remove overlapping genes as shown in my updated question in the top. What did I miss?Thank you in advance
They do not overlap. To be considered as overlapping they must overlap in their cds parts.
Thank you. By any chance, do you have a summary of rules when genes will be considered or not?
I already answered this question here A: intersecting two GFF3 missing data from the second file