Question: When converting a GFF3 file into a EMBL file, what should be filled as locus_tag and ID?
0
gravatar for Jerryliu
11 months ago by
Jerryliu10
Jerryliu10 wrote:

I am using the software EMBLmyGFF3 to convert a gff3 file into a emblem, but the locus_tug and ID were required , do any one knows where I should look for these information? and what do locus_tug and ID mean in an EMBL file?


after converting the file into embl format, I need to put this file as input file into another software, this is the information for the input file in EMBL format:

requierment:Gene annotation in EMBL format TriAnnot Note : only the locus_tag and the id is require to run clariTE.pl, other tag (such as blastp_file...) are not necessary.(following is what the EMBL file look like)

ID unknown; SV 1; linear; unassigned DNA; STD; UNC; 1411106 BP. XX AC unknown; XX XX FT CDS join(141960..142006,142121..142147,142248..142370,142493..142739,142850..142873) FT /locus_tag="v443_0002_EXONERATE_BLASTX_protOSA_6" FT /blastp_file="v443_0002_EXONERATE_BLASTX_protOSA_141960_142873_Q5JM42_Match_0005_mRNA_CDS.bltp" FT /id="v443_0002_EXONERATE_BLASTX_protOSA_141960_142873_Q5JM42_Match_0005_mRNA_joinedCDS" FT /note="Similar_to: hypothetical_protein" FT /note="BestBlastHit: B9EZI3_ORYSJ TrEMBL databank Putative uncharacterized protein - %25id: 91.67 - hcov: 13.78 - qcov: 100.00" FT /note="Status: High Confidence" FT CDS complement(join(143435..144154,144239..144363,145030..145267)) FT /locus_tag="v443_0002_EXONERATE_BLASTX_validated_9" FT /expressed FT /blastp_file="v443_0002_EXONERATE_BLASTX_validated_143435_145267_AFR_02_CAT01_3_Match_0001_mRNA_CDS.bltp" FT /id="v443_0002_EXONERATE_BLASTX_validated_143435_145267_AFR_02_CAT01_3_Match_0001_mRNA_joinedCDS" FT /note="Similar_to: putative_function - F2CSA4_HORVD TrEMBL databank Predicted protein OS Hordeum vulgare var distichum PE 2 SV 1" FT /note="BestBlastHit: F2CSA4_HORVD TrEMBL databank Predicted protein - %25id: 96.12 - hcov: 100.56 - qcov: 100.00" FT /note="Function_coverage: 94.71" FT /note="Function_identity: 97.94" FT /note="Function_target: F2CSA4 22 361" FT /note="Status: High Confidence"

sequence genome gene • 302 views
ADD COMMENTlink modified 9 months ago by Biostar ♦♦ 20 • written 11 months ago by Jerryliu10
1

Please see the Parameter section of the tool manual for more info.

ADD REPLYlink modified 11 months ago • written 11 months ago by Sej Modha4.2k

If you talk about the /id from the qualifier list, it is not an accepted EMBL qualifier. It probably reflects the ID tag of 9th column of the gff3. The EMBLmyGFF3 tool will put the ID from your gff file in a /note qualifier like that:

 /note="ID:g1.t1"

So then you can fix the lines to get an /id qualifier with a sed command:

sed 's/\/note="ID:/\/id="/g' myFile.embl > myReadyFile.embl

About the locus_tag I think you can use any of your choice. If you talk about the ID of the accession ID (First line and AC line), I agree with Sed Modha, take the time to read the readme and the associated ENA documentation., everything should be explained.

ADD REPLYlink modified 9 months ago • written 10 months ago by Juke-342.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 545 users visited in the last hour