file conversion from gtf to gff3 for evidence modeler
0
0
Entering edit mode
8 months ago
rj.rezwan ▴ 10

Hi, could you please guide me how to convert the stringtie output file stringtie_transcript.gtf into .gff3 format for the evidence modeler of genome annotation.

enter image description here

gff3 stringtie gtf • 1.1k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

I used agat for conversion. it converted the file but EVM could not deal with it properly.

ADD REPLY
0
Entering edit mode

Show us your AGAT command, the first few lines of the output and the error produced by the tool that should accept the GFF3.

Also, do not paste screenshots of plain text content, it is counterproductive. You can copy paste the content directly here (using the code formatting option shown below), or use a GitHub Gist if the content volume exceeds allowed length here.

code_formatting

ADD REPLY
0
Entering edit mode

This is the command for AGAT

agat_convert_sp_gxf2gxf.pl -g stringtie_transcript.gtf -o stringtie_transcript.gff3

and the output file is like this stringtie_transcript.gff3

##gff-version 3
# stringtie -p 32 -o stringtie_transcript.gtf merged.bam
# StringTie version 2.2.0
ptg000009l      StringTie       gene    41837   52090   1000    -       .       ID=STRG.5169;fPKM=9.309661;tPM=16.805527;cov=1282.609497;gene_id=STRG.5169;transcript_id=STRG.5169.1
ptg000009l      StringTie       transcript      41837   52090   1000    -       .       ID=STRG.5169.1;Parent=STRG.5169;fPKM=9.309661;tPM=16.805527;cov=1282.609497;gene_id=STRG.5169;transcript_id=STRG.5169.1
ptg000009l      StringTie       exon    41837   42631   1000    -       .       ID=exon-46382;Parent=STRG.5169.1;cov=1685.287109;exon_number=1;gene_id=STRG.5169;transcript_id=STRG.5169.1
ptg000009l      StringTie       exon    43439   43687   1000    -       .       ID=exon-46383;Parent=STRG.5169.1;cov=1713.453979;exon_number=2;gene_id=STRG.5169;transcript_id=STRG.5169.1
ptg000009l      StringTie       exon    43785   44046   1000    -       .       ID=exon-46384;Parent=STRG.5169.1;cov=1426.013306;exon_number=3;gene_id=STRG.5169;transcript_id=STRG.5169.1
ptg000009l      StringTie       exon    45662   45750   1000    -       .       ID=exon-46385;Parent=STRG.5169.1;cov=1408.125854;exon_number=4;gene_id=STRG.5169;transcript_id=STRG.5169.1
ptg000009l      StringTie       exon    45835   46019   1000    -       .       ID=exon-46386;Parent=STRG.5169.1;cov=1306.796997;exon_number=5;gene_id=STRG.5169;transcript_id=STRG.5169.1
ptg000009l      StringTie       exon    50293   50401   1000    -       .       ID=exon-46387;Parent=STRG.5169.1;cov=976.727051;exon_number=6;gene_id=STRG.5169;transcript_id=STRG.5169.1
ptg000009l      StringTie       exon    51103   51159   1000    -       .       ID=exon-46388;Parent=STRG.5169.1;cov=908.881287;exon_number=7;gene_id=STRG.5169;transcript_id=STRG.5169.1
ptg000009l      StringTie       exon    51289   51552   1000    -       .       ID=exon-46389;Parent=STRG.5169.1;cov=955.376221;exon_number=8;gene_id=STRG.5169;transcript_id=STRG.5169.1
ptg000009l      StringTie       exon    51709   52090   1000    -       .       ID=exon-46390;Parent=STRG.5169.1;cov=393.622467;exon_number=9;gene_id=STRG.5169;transcript_id=STRG.5169.1
ptg000009l      StringTie       transcript      41837   52090   1000    -       .       ID=STRG.5169.2;Parent=STRG.5169;fPKM=0.687508;tPM=1.241070;cov=94.719322;gene_id=STRG.5169;transcript_id=STRG.5169.2
ptg000009l      StringTie       exon    41837   42631   1000    -       .       ID=exon-46391;Parent=STRG.5169.2;cov=123.055481;exon_number=1;gene_id=STRG.5169;transcript_id=STRG.5169.2
ADD REPLY
1
Entering edit mode

What about the error produced by the tool you're using?

ADD REPLY
0
Entering edit mode

I would definitely be interested to understand what EVM dislike in the AGAT output!

ADD REPLY
0
Entering edit mode

AGAT produced the output but EVM was not dealing it properly. I show the EVM code and the error here that could make sense to understand the issue, which has been facing. Actually, I canceled the running command after 25 hours because it was not producing any output file. When I looked into the log file, having the following errors in it.

#!/bin/bash
#
#SBATCH --job-name=EVM-annotation
#SBATCH --output=EVM_annotation.%j.out
#SBATCH --partition=batch
#SBATCH --cpus-per-task=32
#SBATCH --time=25:00:00
#SBATCH --mem=800G

module load evidencemodeler/2.1.0

evidence_modeler.pl \
--CPU 32 \
--sample_id accesion_1 \
--genome accession_assembly.bp.p_ctg.fasta \
--weights weights.txt \
--gene_predictions accession_annotation.gff3 \
--transcript_alignments stringtie_transcript.gff3 \
--segmentSize 100000 \
--overlapSize 10000

A chunk of errors in log file is here

Error with prediction: Helixer lend_intergenic: 27515088, rend_intergenic: 27514801  at /ibex/sw/csi/evidencemodeler/2.1.0/linux_binary/EVidenceModeler-v2.1.0/EvmUtils/evidence_modeler.pl line 3220.
Error with prediction: Helixer lend_intergenic: 27516650, rend_intergenic: 27516069  at /ibex/sw/csi/evidencemodeler/2.1.0/linux_binary/EVidenceModeler-v2.1.0/EvmUtils/evidence_modeler.pl line 3220.
Error with prediction: Helixer lend_intergenic: 27694492, rend_intergenic: 27694334  at /ibex/sw/csi/evidencemodeler/2.1.0/linux_binary/EVidenceModeler-v2.1.0/EvmUtils/evidence_modeler.pl line 3220.
Error with prediction: Helixer lend_intergenic: 27864836, rend_intergenic: 27863825  at /ibex/sw/csi/evidencemodeler/2.1.0/linux_binary/EVidenceModeler-v2.1.0/EvmUtils/evidence_modeler.pl line 3220.
Error with prediction: Helixer lend_intergenic: 27930932, rend_intergenic: 27929972  at /ibex/sw/csi/evidencemodeler/2.1.0/linux_binary/EVidenceModeler-v2.1.0/EvmUtils/evidence_modeler.pl line 3220.
Error with prediction: Helixer lend_intergenic: 28032374, rend_intergenic: 28032346  at /ibex/sw/csi/evidencemodeler/2.1.0/linux_binary/EVidenceModeler-v2.1.0/EvmUtils/evidence_modeler.pl line 3220.
Error with prediction: Helixer lend_intergenic: 28480320, rend_intergenic: 28478468  at /ibex/sw/csi/evidencemodeler/2.1.0/linux_binary/EVidenceModeler-v2.1.0/EvmUtils/evidence_modeler.pl line 3220.
Error with prediction: Helixer lend_intergenic: 28559565, rend_intergenic: 28559434  at /ibex/sw/csi/evidencemodeler/2.1.0/linux_binary/EVidenceModeler-v2.1.0/EvmUtils/evidence_modeler.pl line 3220.
Error with prediction: Helixer lend_intergenic: 28629002, rend_intergenic: 28628882  at /ibex/sw/csi/evidencemodeler/2.1.0/linux_binary/EVidenceModeler-v2.1.0/EvmUtils/evidence_modeler.pl line 3220.
Error with prediction: Helixer lend_intergenic: 28880043, rend_intergenic: 28880019  at /ibex/sw/csi/evidencemodeler/2.1.0/linux_binary/EVidenceModeler-v2.1.0/EvmUtils/evidence_modeler.pl line 3220.
Error with prediction: Helixer lend_intergenic: 28880189, rend_intergenic: 28880136  at /ibex/sw/csi/evidencemodeler/2.1.0/linux_binary/EVidenceModeler-v2.1.0/EvmUtils/evidence_modeler.pl line 3220.
Error with prediction: Helixer lend_intergenic: 28999684, rend_intergenic: 28999649  at /ibex/sw/csi/evidencemodeler/2.1.0/linux_binary/EVidenceModeler-v2.1.0/EvmUtils/evidence_modeler.pl line 3220
ADD REPLY
0
Entering edit mode

Hi, I have fixed it. there are some Perl commands in the EVM used to convert the stringtie gtf file into gff3 and abinitio prediction gff3 into gff3 as per the EVM requirement. Thnak you for helping.

ADD REPLY
0
Entering edit mode

Interesting, so it means either they do not follow the format specification or accept only subpart of the specification or they use specific attributes. Could you show us the output of gtf to gff with their script on the gene you showed us previously?

ADD REPLY
0
Entering edit mode

Hi, here is the link for the multiple perl scripts to convert the file for EVM formate

https://github.com/EVidenceModeler/EVidenceModeler/tree/master/EvmUtils/misc
ADD REPLY

Login before adding your answer.

Traffic: 1582 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6