Question: Extract subset of genes from EMBL file
0
gravatar for Roxane Boyer
2.0 years ago by
Roxane Boyer890
France / Toulouse / GeT-Plage
Roxane Boyer890 wrote:

Hello everyone !

In order to perform a test of the tool RATT (Rapid Annotation Transfer Tool : a tool that use an annotation of a close relative for a given species in order to transfer it on a non studied specie, it's a kind of an approximation of annotation. RATT Is used following this tutorial : http://avrilomics.blogspot.fr/2013/04/training-augustus-gene-finding-software.html ), I need a EMBL file that contain only a gene subset of my choice. I'm working, again, on D. melanogaster (close relative) and D. suzukii (non studied)

For example, in my EMBL files, I have numerous identifiers, like FBgn0000120, etc, then the translation of the genes as well as a lot of features, and at the end of the file, the whole nucleic sequence of the chromosome. I want to keep only few CDS features, like a subset of 10 genes that have to be in EMBL format at the end.

Do someone has any idea of how to do this ? I made some researches, but didn't find any thing interesting.

Thanks for your help !

Roxane

tool script embl ratt • 577 views
ADD COMMENTlink written 2.0 years ago by Roxane Boyer890

If you have familiarity with Perl, you can use BioPerl to parse the EMBL file, the Beginners How-To should help you to write the script quickly.

Another quick and dirty option is to parse the file directly, using Perl's input record separator to read the file record by record (something like $/ = "\n//";), and then output only the records you want.

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by h.mon24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1449 users visited in the last hour