Maker2 Transcriptome Input
0
0
Entering edit mode
2.9 years ago
Colaptes ▴ 70

Hello,

I am annotating the genome of a non-model bird using the Maker2 pipeline and I have a question about filtering the RNA input.

Since transcriptomes were not available for my species, I downloaded RNA-seq reads from the closest relative on NCBI and built a transcriptome with Trinity. The output contains thousands of contigs (275,967 "Trinity genes", 542,886 contigs, contig N50=687) which is clearly many more than the number of real genes (I would expect ~20,000), but I have heard that it is normal to get many more "Trinity genes" than there are real genes. My question is whether I should filter this transcriptome (by rpkm for example) to reduce its size or if it is better to provide the entire set as RNA evidence for Maker2.

I started Maker2 with the entire Trinity output as the RNA input and it has been running for over a month, and I suspect the slow tBLASTx could be one of the bottlenecks as it runs through the thousands of transcripts. Is it recommended to reduce the size of the RNA library input or would that cause problems of excluding some real transcripts from the annotation? I would like to speed it up for future annotations but I don't want to sacrifice the quality of the annotation.

Thank you.

Maker2 Annotation Trinity Transcriptome • 1.0k views
1
Entering edit mode

Use a guided assembly then, it would create much less false positive.

1
Entering edit mode

Whatever transcriptome you end up constructing should be inputted as altest= and not est= in the maker_opts.ctl file if the transcriptome comes from a different species.

2
Entering edit mode

In case it is an evidence-based annotation (est2genome=1), I would suggest to use both because they do not use the same e-value cutoff. MAKER doesn't create any gene models from altest= option, it is just used to add UTRs. So if they map quite well for some of them it would be pity to not use them to create gene models.

I realise that I don't know if altest= data is used to create hints for the ab-initio predictors. Something I would like to know.

0
Entering edit mode

Oh, I did not know that! I have been using altest= and assuming that it was being used to create hints. Sadly the closest RNA reads are from a fairly distant organism (~35 million years diverged) so I'm not sure est= would work well.

0
Entering edit mode

I checked on the MAKER mailing list lt lools like altest creates hints for ab initio gene predictors (Carson said it is used to anchor gene prediction). It is also used to add UTRs. But he clearly says that it is not used to create gene models.

0
Entering edit mode

Thanks! I will look at guided assembly. Unfortunately many of the species I am assembling do not have a close reference genome yet or I think it is too fragmented for guided assembly.