Question: When converting a GFF3 file into a EMBL file, what should be filled as locus_tag and ID?
0
gravatar for Jerryliu
21 months ago by
Jerryliu10
Jerryliu10 wrote:

I am using the software EMBLmyGFF3 to convert a gff3 file into a emblem, but the locus_tug and ID were required , do any one knows where I should look for these information? and what do locus_tug and ID mean in an EMBL file?


after converting the file into embl format, I need to put this file as input file into another software, this is the information for the input file in EMBL format:

requierment:Gene annotation in EMBL format TriAnnot Note : only the locus_tag and the id is require to run clariTE.pl, other tag (such as blastp_file...) are not necessary.(following is what the EMBL file look like)

ID unknown; SV 1; linear; unassigned DNA; STD; UNC; 1411106 BP. XX AC unknown; XX XX FT CDS join(141960..142006,142121..142147,142248..142370,142493..142739,142850..142873) FT /locus_tag="v443_0002_EXONERATE_BLASTX_protOSA_6" FT /blastp_file="v443_0002_EXONERATE_BLASTX_protOSA_141960_142873_Q5JM42_Match_0005_mRNA_CDS.bltp" FT /id="v443_0002_EXONERATE_BLASTX_protOSA_141960_142873_Q5JM42_Match_0005_mRNA_joinedCDS" FT /note="Similar_to: hypothetical_protein" FT /note="BestBlastHit: B9EZI3_ORYSJ TrEMBL databank Putative uncharacterized protein - %25id: 91.67 - hcov: 13.78 - qcov: 100.00" FT /note="Status: High Confidence" FT CDS complement(join(143435..144154,144239..144363,145030..145267)) FT /locus_tag="v443_0002_EXONERATE_BLASTX_validated_9" FT /expressed FT /blastp_file="v443_0002_EXONERATE_BLASTX_validated_143435_145267_AFR_02_CAT01_3_Match_0001_mRNA_CDS.bltp" FT /id="v443_0002_EXONERATE_BLASTX_validated_143435_145267_AFR_02_CAT01_3_Match_0001_mRNA_joinedCDS" FT /note="Similar_to: putative_function - F2CSA4_HORVD TrEMBL databank Predicted protein OS Hordeum vulgare var distichum PE 2 SV 1" FT /note="BestBlastHit: F2CSA4_HORVD TrEMBL databank Predicted protein - %25id: 96.12 - hcov: 100.56 - qcov: 100.00" FT /note="Function_coverage: 94.71" FT /note="Function_identity: 97.94" FT /note="Function_target: F2CSA4 22 361" FT /note="Status: High Confidence"

sequence genome gene • 455 views
ADD COMMENTlink modified 20 months ago by Biostar ♦♦ 20 • written 21 months ago by Jerryliu10
1

Please see the Parameter section of the tool manual for more info.

ADD REPLYlink modified 21 months ago • written 21 months ago by Sej Modha4.7k

If you talk about the /id from the qualifier list, it is not an accepted EMBL qualifier. It probably reflects the ID tag of 9th column of the gff3. The EMBLmyGFF3 tool will put the ID from your gff file in a /note qualifier like that:

 /note="ID:g1.t1"

So then you can fix the lines to get an /id qualifier with a sed command:

sed 's/\/note="ID:/\/id="/g' myFile.embl > myReadyFile.embl

About the locus_tag I think you can use any of your choice. If you talk about the ID of the accession ID (First line and AC line), I agree with Sed Modha, take the time to read the readme and the associated ENA documentation., everything should be explained.

ADD REPLYlink modified 20 months ago • written 21 months ago by Juke344.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1232 users visited in the last hour