Hi,
I have both StringTie (stringtie_merged.gtf.fasta.transdecoder.genome.gff3) and BRAKER2 (augustus.hints_utr.gff3) GFF3 files which I would like to intersect with bedtools. Whenever StringTie covers the genome region and BRAKER2 intersect them then I would like to remove BRAKER2 annotation. However, if BRAKER2 is the only in the region then I would like to keep it. Unfortunately, the following commands did not include the last condition as shown in this screenshot e.g. for g83538.t1
.
bedtools intersect -wa -a stringtie_merged.gtf.fasta.transdecoder.genome.gff3 -b augustus.hints_utr.gff3 > bedtools.gff3
perl gff3sort.pl --precise bedtools.gff3 > bedtools.gff3sort.gff
I tried also gff3_sp_complement_annotations.pl
from the GAAS package but it kept the BRAKER2 annotation despite overlapping with StringTie e.g. g83533.t1
as shown in this screenshot. How it possible to remove BRAKER2 annotation when it is overlapping with StringTie?
What did I miss or is there a better tools to intersect GFF3 files?
Thank you in advance,
The screenshot is difficult to read. I don't know what is really the problem... I think we have to define more in details what you consider as overlapping.
I don't know for
bedtools intersect
butagat_sp_complement_annotations.pl
considesr two features to be overlapping if:- they are on the same strand.
- they are of the same type (e.g. mRNA can overlap a tRNA).
- they overlap at CDS level (for mRNA) or at exon level (for other type of feature).
=> Thus two mRNA can overlap (kept in the output) if they are not in the same strand because they are seen as two different locus.
=> Thus two mRNA can overlap (kept in the output) if they overlap in their UTR (because UTR are rarely well defined).
=> One mRNA can overlap another mRNA if their CDS is not overlapping (one can be included in the intron of the other).
=> One mRNA can overlap a tRNA.
So the output reflects these rules. Which rule do you not agree with? We can adapt the script and add extra paramters.
P.S: I see you have run
GFF3sort
afteragat_sp_complement_annotations.pl
it is not needed. All script from GAAS with the prefixagat_sp
provide the same sorting output.