Extract subset of genes from EMBL file
0
0
Entering edit mode
7.1 years ago
Rox ★ 1.4k

Hello everyone !

In order to perform a test of the tool RATT (Rapid Annotation Transfer Tool : a tool that use an annotation of a close relative for a given species in order to transfer it on a non studied specie, it's a kind of an approximation of annotation. RATT Is used following this tutorial : http://avrilomics.blogspot.fr/2013/04/training-augustus-gene-finding-software.html ), I need a EMBL file that contain only a gene subset of my choice. I'm working, again, on D. melanogaster (close relative) and D. suzukii (non studied)

For example, in my EMBL files, I have numerous identifiers, like FBgn0000120, etc, then the translation of the genes as well as a lot of features, and at the end of the file, the whole nucleic sequence of the chromosome. I want to keep only few CDS features, like a subset of 10 genes that have to be in EMBL format at the end.

Do someone has any idea of how to do this ? I made some researches, but didn't find any thing interesting.

Thanks for your help !

Roxane

EMBL RATT • 1.4k views
ADD COMMENT
0
Entering edit mode

If you have familiarity with Perl, you can use BioPerl to parse the EMBL file, the Beginners How-To should help you to write the script quickly.

Another quick and dirty option is to parse the file directly, using Perl's input record separator to read the file record by record (something like $/ = "\n//";), and then output only the records you want.

ADD REPLY

Login before adding your answer.

Traffic: 2662 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6