Perl Module For Parsing Ensembl Annotation Flat File
2
1
Entering edit mode
12.1 years ago
Casual ▴ 90

Hi there~ Now I got batch of gene IDs,and downloaded the annotation flat files from Ensembl (same specie ,of course). I wanna extract the description (briefly introduction for gene function) in the annotation file for each one of the IDs. Could anyone suggest a Perl module to do this job? Or other soft suit will be fine. I appreciate for your help!

ensembl annotation • 3.4k views
ADD COMMENT
1
Entering edit mode

I guess BioPerl would do the job.

ADD REPLY
0
Entering edit mode

The key question here is what exactly do you mean by "annotation flat files?" For doing things like extracting a couple of fields from a text file, just use "cut" instead of writing a script.

ADD REPLY
0
Entering edit mode

I know BioPerl, and did lots of search at CPAN~

ADD REPLY
2
Entering edit mode
12.1 years ago
Michael 54k

There is no single Ensembl format, that's possibly why your search didn't return something usable (you did a search on search.cpan.org for perl modules, didn't you ;).

You probably meant:

ADD COMMENT
0
Entering edit mode

Thank you! These links really helped me. Although I didn't succeed in parsing ensembl format annotaion file (I'm confused about which part is feature and which is tag), I finally got job done by downloading Genbank format annotation file, and wrote a BioPerl script for this.It's much easier~~

ADD REPLY
0
Entering edit mode

nice it help,but which an ensembl format file are you referring to??

ADD REPLY
1
Entering edit mode
12.0 years ago

Maybe you want to consider using the Ensembl Perl API to get all the information you want for a gene, given a gene stable ID. Not only you won't have any trouble with formats, but you would be able to get much more (cross-references, GO annotations, location, etc).

Here are a few lines of code on how to do this:

[?]

You can easily extract other information in addition to the description.

ADD COMMENT
1
Entering edit mode

It's interesting this module not in the CPAN or BioPerl project? Only could be found at Ensembl~

ADD REPLY
0
Entering edit mode

That is right. The main reason is that the Ensembl API is updated for every single Ensembl release and this is why it makes more sense to have it separate from BioPerl.

ADD REPLY

Login before adding your answer.

Traffic: 2478 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6