Hi,

I was using the MAKER pipeline for gene prediction on my RNA-seq data and the pipeline worked fine. Now the output it generated is named like this:

>maker-contig-dpp-500-500-exonerate_est2genome-gene-0.0-mRNA-3 protein AED:0.24 eAED:0.24 QI:184|1|1|1|0|0|3|1049|588


As you can see its not possible to really see what gene/protein this might be. So is there an easy way to get some info for my predicted genes which existing ones they might be? Any existing program or script?

This has to be a common problem, so there have to be solutions available right?

Gene prediction is not quite the same as gene annotation (e.g. adding information to a gene). MAKER is an evidence-based annotation pipeline which can use gene predictions and additional evidence (including protein similarity and transcriptome information) to generate a set of potential gene models. However there are additional (optional) steps downstream of the initial gene prediction to add additional annotation information, such as what the protein is most similar to (using BLASTP against uniprot to add descriptions) and protein domain information (via interproscan). See the scripts maker_functional_gff, maker_functional_fasta, ​and ipr_update_gff, and also this site.

In addition to Chris' response, I think I need to point out that MAKER is a pipeline for genome annotation and not transcriptome annotation. There are many differences between the two problems, including expectations about the size of the segments you are annotating and expectations about gene completeness.

So I'd recommend against using MAKER to get gene models from your RNAseq data. The people behind Trinity have a transcriptome annotation pipeline called TransDecoder that I've used to annotate Trinity assemblies of RNAseq data.

That being said, if you want to find out what the genes MAKER annotated are, then the resources that Chris pointed (interproscan, uniprot against blastp) out are what we usually use to identify orthology.

Just a slight correction to @dandence useful suggestion. TransDecoder is actually used for predicting coding regions in the transcripts; for annotation you should use Trinotate.

Heh, I completely missed the fact this was RNA-Seq! I also use TransDecoder and Trinotate.