Hi, I am wondering what the correct way is for adding protein domains to CDS entries in a Genbank file.
InterProScan annotates a CDS and tells me where each protein domain might be (or it might annotate the whole gene, but that's not an issue for me). If I have a CDS from coordinate 330 to coordinate 1178, and a domain is found at 342..1170, and a second domain is found at 348..1164, then how is this shown in the Genbank file? And even easier, is there a way to simply do it with BioPerl?
I am currently doing it like such, but when I load it into the Apollo genome viewer which is my benchmark for correctness, it doesn't look exactly right. It just groups everything into one misc_feature in the interface, with all features combined.
Thank you for your help!
LOCUS NODE_80_length_3830_cov_32.131855 3952 bp dna linear UNK
ACCESSION unknown
FEATURES Location/Qualifiers
source 1..3952
/mol_type="genomic DNA"
/project="K5661"
/organism="XXXXXX"
gene 330..1178
/locus_tag="K5661_draft_3226"
CDS 330..1178
/locus_tag="K5661_draft_3226"
/product="Sulfate-binding protein sbp"
misc_feature 342..1170
/locus_tag="K5661_draft_3226"
/evalue="1.2e-71"
/database_name="SUPERFAMILY"
/status="T"
/evidence=superfamily
/product="Sulfate-binding protein sbp"
/product="Periplasmic binding protein-like II"
/accession_num="SSF53850"
misc_feature 348..1164
/locus_tag="K5661_draft_3226"
/evalue="1.9e-131"
/database_name="TIGRFAMs"
/status="T"
/evidence=HMMTigr
/product="Sulfate-binding protein sbp"
/product="3a0106s03: sulfate ABC transporter,
sulfate-bindin"
/accession_num="TIGR00971"
[etc, and ORIGIN with the sequence is correctly shown at the end]
Ok... no help on this exact question yet. What about any help on finding documentation for sub features in a genbank file? I cannot understand from the basic genbank documentation on how to add sub features. Somehow GFF3 can do it but not Genbank--doesn't make sense.