Question

Implement ensembl gene annotation pipeline for my assembly

3

Entering edit mode

5.4 years ago

Mbillah ▴ 140

I'm new in gene annotation, can any one help me to implement ensembl gene annotation pipelines. How can I implement the ensembl gene annotation pipeline for my data? ensembl is web based? is there any linux package? can I implement it on my server or their website? can anyone give any tutorial link?

TIA

gene annotation ensembl • 2.3k views

ADD COMMENT • link updated 5.4 years ago by WouterDeCoster 47k • written 5.4 years ago by Mbillah ▴ 140

1

Entering edit mode

It is unclear which data you have and what you aim to obtain. Please elaborate (e.g on file formats) and be specific.

ADD REPLY • link 5.4 years ago by WouterDeCoster 47k

0

Entering edit mode

I have paired read, contigs, scaffolds , gff file and now I want to annotate the gene like Protein coding genes, Small non coding genes, Long non coding genes, Other non coding genes, Pseudogenes, Gene transcripts.

ADD REPLY • link 5.4 years ago by Mbillah ▴ 140

1

Entering edit mode

Maybe this repo: https://github.com/Ensembl/ensembl-annotation suits you, but it isn't been finished.

ADD REPLY • link 5.4 years ago by hsiaoyi0504 ▴ 70

1

Entering edit mode

Actually I don't understand how can I start, can you tell me how can I start? can you please explain this command

find . -name '*.p[l|m]' -exec perltidy -pro=perltidyrc -b {} \;

ADD REPLY • link 5.4 years ago by Mbillah ▴ 140

1

Entering edit mode

Check this https://www.linode.com/docs/tools-reference/tools/find-files-in-linux-using-the-command-line/

ADD REPLY • link 5.4 years ago by hsiaoyi0504 ▴ 70

0

Entering edit mode

So what you have is an assembly and a gff file. Please change your post to make this more clear.

ADD REPLY • link 5.4 years ago by WouterDeCoster 47k

score 5 · Answer 1 · 2018-11-11

There is currently no easy way or stream-lined way to install the Ensembl annotation pipeline locally, therefore I do not recommend to even attempt this as a beginner. This doesn't mean it has to stay like this, Ensembl and EBI have been working on a distributed Ensembl infrastructure within Elixir which involves the EBI, Elixir-Norway and Sweden. Possibly, part of the outcome will be a Docker container that runs the whole annotation pipeline with documentation. Have a look at the webinar to see if you might be interested in testing it out anyway. If you want I can try to find out more about the current state of the Ensembl Docker images.

In the meantime I recommend to use the MAKER2 pipeline.

Update:

Unfortunately, it is unlikely that there will be an installable Ensembl annotation pipeline in a Docker container, or otherwise, in the foreseeable future. The efforts towards distributed Ensembl have mainly focussed on the services, like the genome browser and back-end. That means in summary it is only Ensembl that can run the Ensembl annotation pipeline. Also, the Ensembl annotation pipeline relies heavily on Protein evidence, while in your case you might mostly have RNA-seq evidence. For such, MAKER is more suitable.

score 0 · Answer 2 · 2018-11-11

0

Entering edit mode

5.4 years ago

EagleEye 7.5k

You may convert your ensembl GTF into gene-based annotation table (tab-delimited). Then you can import this simple table in R or just use linux command-line tools to annotated your results.

Check out this post A: extract only geneID and gene symbol from GTF file

ADD COMMENT • link 5.4 years ago by EagleEye 7.5k