8.3 years ago by
Boston, MA USA
I would join the contigs if a couple important criteria are met:
1) Each contig must align to a unique portion of the distant relative with no overlap in residue positions covered (on that relative)? You don't want contig A to match amino acids 15 - 97 and contig B to match amino acids 85 to 188 as this indicates that those two contigs should be joined during assembly of the genome and not manually for the sake of this gene hunting/modeling.
2) Each contig should have relatively the same percent identity and percent similarity. Relatively is key here and it is hard to define what is an acceptable range. You do not want to be dealing with paralogs - 2 genes - when you're assuming a single gene. In other words, don't manually create a gene fusion.
I would also run the translation of the contigs against motif finders (Pfam eg) to assist in identifying what may be missing, if anything, from the protein-coding portion of your gene model.
As to biological function of the TA repeats - could be transposon insertion sites/remnants, could be structural for DNA itself, could be but are unlikely binding sites for DNA modification enzymes or transcription factors. Genetically, these can be used as markers.