Question: Maker output - Easy way to find actual gene names?
gravatar for voidnyx
3.5 years ago by
voidnyx10 wrote:



I was using the MAKER pipeline for gene prediction on my RNA-seq data and the pipeline worked fine. Now the output it generated is named like this:

>maker-contig-dpp-500-500-exonerate_est2genome-gene-0.0-mRNA-3 protein AED:0.24 eAED:0.24 QI:184|1|1|1|0|0|3|1049|588


As you can see its not possible to really see what gene/protein this might be. So is there an easy way to get some info for my predicted genes which existing ones they might be? Any existing program or script?


This has to be a common problem, so there have to be solutions available right?

ADD COMMENTlink modified 3.5 years ago by dandence0 • written 3.5 years ago by voidnyx10

Are you annotating a new genome? Look at a few papers doing that kind of work, the supplementary data is always nice. Read in particular Genome Biology. In the latest issue you get the Cheetah genome

ADD REPLYlink written 3.5 years ago by cyril-cros890
gravatar for Chris Fields
3.5 years ago by
Chris Fields2.1k
University of Illinois Urbana-Champaign
Chris Fields2.1k wrote:

Gene prediction is not quite the same as gene annotation (e.g. adding information to a gene).  MAKER is an evidence-based annotation pipeline which can use gene predictions and additional evidence (including protein similarity and transcriptome information) to generate a set of potential gene models.  However there are additional (optional) steps downstream of the initial gene prediction to add additional annotation information, such as what the protein is most similar to (using BLASTP against uniprot to add descriptions) and protein domain information (via interproscan).  See the scripts maker_functional_gff, maker_functional_fasta, ​and ipr_update_gff, and also this site.

ADD COMMENTlink written 3.5 years ago by Chris Fields2.1k
gravatar for dandence
3.5 years ago by
United States
dandence0 wrote:

In addition to Chris' response, I think I need to point out that MAKER is a pipeline for genome annotation and not transcriptome annotation.  There are many differences between the two problems, including expectations about the size of the segments you are annotating and expectations about gene completeness. 

So I'd recommend against using MAKER to get gene models from your RNAseq data. The people behind Trinity have a transcriptome annotation pipeline called TransDecoder that I've used to annotate Trinity assemblies of RNAseq data.

That being said, if you want to find out what the genes MAKER annotated are, then the resources that Chris pointed (interproscan, uniprot against blastp) out are what we usually use to identify orthology. 

ADD COMMENTlink written 3.5 years ago by dandence0

Just a slight correction to @dandence useful suggestion. TransDecoder is actually used for predicting coding regions in the transcripts; for annotation you should use Trinotate

ADD REPLYlink written 3.5 years ago by apt.university70

Heh, I completely missed the fact this was RNA-Seq!  I also use TransDecoder and Trinotate.

ADD REPLYlink written 3.5 years ago by Chris Fields2.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1592 users visited in the last hour