Maker output - Easy way to find actual gene names?
2
1
Entering edit mode
8.4 years ago
voidnyx ▴ 10

Hi,

I was using the MAKER pipeline for gene prediction on my RNA-seq data and the pipeline worked fine. Now the output it generated is named like this:

>maker-contig-dpp-500-500-exonerate_est2genome-gene-0.0-mRNA-3 protein AED:0.24 eAED:0.24 QI:184|1|1|1|0|0|3|1049|588

As you can see its not possible to really see what gene/protein this might be. So is there an easy way to get some info for my predicted genes which existing ones they might be? Any existing program or script?

This has to be a common problem, so there have to be solutions available right?

maker gene-prediction • 4.9k views
ADD COMMENT
0
Entering edit mode

Are you annotating a new genome? Look at a few papers doing that kind of work, the supplementary data is always nice. Read in particular Genome Biology. In the latest issue you get the Cheetah genome http://www.genomebiology.com/2015/16/1/277

ADD REPLY
2
Entering edit mode
8.4 years ago
Chris Fields ★ 2.2k

Gene prediction is not quite the same as gene annotation (e.g. adding information to a gene). MAKER is an evidence-based annotation pipeline which can use gene predictions and additional evidence (including protein similarity and transcriptome information) to generate a set of potential gene models. However there are additional (optional) steps downstream of the initial gene prediction to add additional annotation information, such as what the protein is most similar to (using BLASTP against uniprot to add descriptions) and protein domain information (via interproscan). See the scripts maker_functional_gff, maker_functional_fasta, ​and ipr_update_gff, and also this site.

ADD COMMENT
0
Entering edit mode
8.4 years ago
dandence • 0

In addition to Chris' response, I think I need to point out that MAKER is a pipeline for genome annotation and not transcriptome annotation. There are many differences between the two problems, including expectations about the size of the segments you are annotating and expectations about gene completeness.

So I'd recommend against using MAKER to get gene models from your RNAseq data. The people behind Trinity have a transcriptome annotation pipeline called TransDecoder that I've used to annotate Trinity assemblies of RNAseq data.

That being said, if you want to find out what the genes MAKER annotated are, then the resources that Chris pointed (interproscan, uniprot against blastp) out are what we usually use to identify orthology.

ADD COMMENT
3
Entering edit mode

Just a slight correction to @dandence useful suggestion. TransDecoder is actually used for predicting coding regions in the transcripts; for annotation you should use Trinotate.

ADD REPLY
1
Entering edit mode

Heh, I completely missed the fact this was RNA-Seq! I also use TransDecoder and Trinotate.

ADD REPLY

Login before adding your answer.

Traffic: 2274 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6