Question

adding in new evidence to a Maker annotation job

0

Entering edit mode

4.1 years ago

devon.orourke ▴ 50

Hi all,

I annotated a vertebrate genome last month using Maker (v-3.01) using a combination of (1) same-species transcriptome fasta; (1) alternative-species transcriptome fasta; and (1) protein fasta file. Earlier this week I stumbled across two new transcriptome datasets of the same-species type, assembled those data, and want to add them into a new Maker run. My confusion rests on the assumption (possibly wrong!) that a larger dataset of same-species transcriptome evidence would produce a greater number of transcripts than a subset of that same dataset. This isn't what I've found at all, as when I attempted to add in these two new datasets, the total number of transcripts was _less_ than when I used a single transcriptome.

I'm hopeful that someone can review my commands below and tell me where I went astray. Is there a standard way to add to a previous Maker run that I'm missing? Are there parameters I failed to invoke?

It seemed like the logical spot would be to put the already completed GFF in the Re-annotation Using MAKER Derived GFF3 section of the maker_opts.ctl file, so I added the already aligned transcriptome and protein data from the first round of Maker into the maker_gff parameter (this was the output from the initial round of Maker following the _merge_gff -d path/to/index.log_ command). I added the two new fasta files in the EST Evidence section as a comma separated list. Strangely enough, I ended up with about 2,500 _fewer_ transcripts (from about 21k in round1 to 18.5k in this reannotation run).

I thought perhaps I wasn't specifying a parameter right in the opts.ctl file, but I think I switched on the relevant parameters: est_pass=1, altest_pass=1, and protein_pass=1 in the Re-annotation Using MAKER section, and est2genome=1 and protein2genome=1 in the Gene Prediction section.

My second attempt at a reannotation avoided the Re-annotation Using MAKER Derived GFF3 section altogether (all values empty). Instead, I passed in the two new transcriptomes as est evidence as before, but deconstructed the initial _Maker-Round1.gff_ into its est, altest, protein, and repeat components (_awk '{if ($2=="...") print $0 > component.gff_ where the ... would vary to match for each of those four components). I then added the aligned same-species transcriptome as est_gff in the same EST Evidence section. I added the previously aligned alternate-species transcriptome data as altest_gff in the EST Evidence section also. Likewise I passed the protein gff file through the protein_gff parameter, and added the repeat data through the rm_gff parameter in the Repeat Masking section.

This second attempt yielded fewer transcripts than the original run too! To add further intrigue (to me) though, there were more transcripts in this run than in the first attempt at reannotation.

Thanks for any clarification on what steps are most appropriate. Perhaps I need to just start from scratch and align everything anew.

Cheers!

maker annotation Maker Annotation • 885 views

ADD COMMENT • link 4.1 years ago by devon.orourke ▴ 50