Question: Extract subset of genes from EMBL file
gravatar for Roxane Boyer
2.5 years ago by
Roxane Boyer970
France / Toulouse / GeT-Plage
Roxane Boyer970 wrote:

Hello everyone !

In order to perform a test of the tool RATT (Rapid Annotation Transfer Tool : a tool that use an annotation of a close relative for a given species in order to transfer it on a non studied specie, it's a kind of an approximation of annotation. RATT Is used following this tutorial : ), I need a EMBL file that contain only a gene subset of my choice. I'm working, again, on D. melanogaster (close relative) and D. suzukii (non studied)

For example, in my EMBL files, I have numerous identifiers, like FBgn0000120, etc, then the translation of the genes as well as a lot of features, and at the end of the file, the whole nucleic sequence of the chromosome. I want to keep only few CDS features, like a subset of 10 genes that have to be in EMBL format at the end.

Do someone has any idea of how to do this ? I made some researches, but didn't find any thing interesting.

Thanks for your help !


tool script embl ratt • 688 views
ADD COMMENTlink written 2.5 years ago by Roxane Boyer970

If you have familiarity with Perl, you can use BioPerl to parse the EMBL file, the Beginners How-To should help you to write the script quickly.

Another quick and dirty option is to parse the file directly, using Perl's input record separator to read the file record by record (something like $/ = "\n//";), and then output only the records you want.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by h.mon27k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1167 users visited in the last hour