I am wanting to identify exon-intron boundaries between transcripts and a genome for a given species using exonerate. For the transcripts, I have both a CDS file and a protein fasta file. I noticed a number of modeling options in exonerate, but honestly I'm not sure which is best for my purpose. I've narrowed the list down to four potential models:
est2genome – This model is similar to the affine:local model, but it also includes intron modelling on the target sequence to allow alignment of spliced to unspliced coding sequences for both forward and reversed genes. This is similar to the alignment models used in programs such as EST_GENOME and sim4.
protein2genome – This model allows alignment of a protein sequence to genomic DNA. This is similar to the protein2dna model, with the addition of modelling of introns and intron phases. This model is simliar to those used by genewise.
coding2genome – This is similar to the est2genome model, except that the query sequence is translated during comparison, allowing a more sensitive comparison.
cdna2genome – This combines properties of the est2genome and coding2genome models, to allow modeling of whole cDNA where a central coding region can be flanked by non-coding UTRs. When the CDS start and end is known, it may be specified using the --annotation option (see below) to permit only the correct coding region to appear in the alignemnt.
I don't necessarily see a need for coding2genome since I have query sequences already translated (i.e., aa fasta file), in which case I could go protein2genome. I'm not sure what the query input should be for est2genome or cdna2genome, but would it be faster/easier and just as accurate to use the CDS file to query against the genome with either of these two programs?
EDIT: Just in case it is useful information, I want to then use the exon-intron boundary data outputted by exonerate to compare exons from this species' transcripts to transcriptomes of other species with reciprocal best hits (blastn) or using blastx.