2
0
Entering edit mode
8.7 years ago
user ▴ 870

In the GFF3 format, I believe intron is a valid entry type along with exon, mRNA, etc. but is typically removed since it is redundant. Is there a tool that takes an input GFF3 file and adds the intron entries to it? Sometimes it's very helpful to have it in the file, even though the intron coordinates are implicit (for example if you want to map reads against exons and introns.)

If there's a program that does it out of the box I would like to know to avoid reinventing the wheel. In short, take a GFF3 file and print out all of its entries plus the intron entries that are implicit in the original GFF3 file. thanks.

edit just found Is Is Feasible To Produce Intron Gff According To Utr Gff And Cds Gff? -- genometools seems to be a solution. It'd be nice if there were other, lighter scripts that do this, since it seems excessive to install/compile genometools just for this purpose if you don't already use genometools (though genometools looks very useful in general)

coordinates gff3 conversion • 3.7k views
1
Entering edit mode
8.7 years ago

Yes, installing GenomeTools just to add introns may be excessive, but you might be surprised at what other useful functionality they provide. I've been using GenomeTools quite a bit for 2+ years now and I have to say I'm very pleased...

But as to your question, no I'm not aware of out-of-the-box script/program that will do this for you. I just threw together the following, but beware it doesn't sort the output--all of the introns will be printed at the end.

#!/usr/bin/env perl
use strict;

my $exon_lists = {}; while(my$line = <STDIN>)
{
next if($line =~ m/^###$/);
print $line; my @gff3 = split(/\t/,$line);
if($gff3[2] eq "exon") { my($mRNAid) = $gff3[8] =~ m/Parent=([^\s;]+)/; printf(STDERR "warning: exon has no 'Parent' attribute: %s",$line) unless($mRNAid);$exon_lists->{$mRNAid} = [] unless($exon_lists->{$mRNAid}); push(@{$exon_lists->{$mRNAid}}, \@gff3); } } foreach my$mRNAid(keys(%$exon_lists)) { my @sorted_exons = sort {$a->[3] <=> $b->[3] ||$a->[4] <=> $b->[4] } (@{$exon_lists->{$mRNAid}}); for(my$i = 1; $i < scalar(@sorted_exons);$i++)
{
printf("%s\n",
join("\t",
$sorted_exons[$i]->[0],
$sorted_exons[$i]->[1],
"intron",
$sorted_exons[$i-1]->[4] + 1,
$sorted_exons[$i]->[3] - 1,
$sorted_exons[$i]->[5],
$sorted_exons[$i]->[6],
$sorted_exons[$i]->[7],
"Parent=$mRNAid" ) ); } }  ADD COMMENT 0 Entering edit mode 3 months ago wuei06 • 0 1. #!/usr/bin/perl -w  use strict; use warnings; die "USAGE:\n$0  <gff_old> <gff_intron>" unless (@ARGV==2);
open (IN, "<$ARGV[0]") or die "$!";
open (OUT,">$ARGV[1]") or die "$!";
#chr01   .       gene    4870    7709    0.05    +       .       ID=Barthea00001;Augustus_transcriptSupport_percentage=0;Augustus_intronSupport=0/1;Source=augustus00001;Integrity=complete;
#chr01   .       mRNA    4870    7709    0.05    +       .       ID=Barthea00001.t1;Parent=Barthea00001;IntronSupport=0/1;Integrity=complete;
#chr01   .       exon    4870    5460    .       +       .       ID=Barthea00001.t1.exon1;Parent=Barthea00001.t1;
#chr01   .       CDS     4870    5460    0.87    +       0       ID=Barthea00001.t1.CDS1;Parent=Barthea00001.t1;
#chr01   .       exon    7455    7709    .       +       .       ID=Barthea00001.t1.exon2;Parent=Barthea00001.t1;
#chr01   .       CDS     7455    7709    0.05    +       0       ID=Barthea00001.t1.CDS2;Parent=Barthea00001.t1;
my @gene;
while(<IN>){
chomp $_; if(/gene/){ if(@gene){ my @exons; my$strand;
my $parent; my$chr;
my $j=0; my %p; map{ if(/exon/){ my @line=split/\t/,$_;
push @exons,($line[3],$line[4]);
$parent=(split/[;=]/,$line[8])[1];
$parent=~s/\.exon[1-9]+//;$chr=$line[0];$strand=$line[6];$p{$line[3]}=$j;
}
$j++; } @gene; if($#exons>1){
for(my $i = 0,my$n=0;$i<=$#exons-2;$i=$i+2,$n++){ my$k=$n+1; my$record;
if($strand eq "+"){$record=join "\t",($chr,".","intron",$exons[$i+1]+1,$exons[$i+2]-1,".","+",".","ID=$parent.intron$k;Parent=$parent");
}
if($strand eq "-"){$record=join "\t",($chr,".","intron",$exons[$i+3]+1,$exons[$i]-1,".","-",".","ID=$parent.intron$k;Parent=$parent");
}
splice @gene,$p{$exons[$i]}+$n+1,0,$record; } } %p=(); @exons=(); my$gene1=join "\n",@gene;
print OUT "$gene1\n"; @gene=(); } print OUT "$_\n";
}
else{
push @gene,\$_;
}
}
close OUT;
close IN;

0
Entering edit mode

I tried to use gt gff3 --addintrons in the package GenomeTools to display intron feature in my gff3 file, my gff file might be not strict gff3 format, and it did not work for me, so I wrote one myself, wish it helpful to you. My suggestion is to insert the introns next to the exon using function spice

1
Entering edit mode

There is the AGAT solution too with agat_sp_add_introns.pl, it is supposed to work on every cases.

0
Entering edit mode

Thanks for sharing, I'll fork this tool