Hello,
I sequenced a genome at very low coverage, so the de novo assembly is really fragmented, N50 is around 5,000, with the top contig length ~110K. The species genome size is around 1.4G.
There is a distantly related species (>160 million years) with annotated genome, so I want to use it to identify homologous genes on my de novo assembly. I thought about two ways:
I can get the ensemble protein sequences from the distantly-related species, and use TBLASTN to align it to my genome, but my genome sequences is really fragmented, so one gene might have different parts on different contigs, and may be missing parts. I don't know how to solve.
Or I can try to align my assembled contigs to the genome, kind of putting those contigs in place, and then get the genes, but I am not very familiar with what tools should be used for genome alignment between these distantly related species.
Any thoughts?
Thank you!
I guess they are too distantly related for a contig to genome alignment. I would go for mapping annotated peptides to your contigs. I have used Spaln for this and worked great, by my pair of species were more closely related.