Question: extracting genes from highly fragmented genome
gravatar for celesty
4.7 years ago by
United States
celesty0 wrote:


I sequenced a genome at very low coverage, so the de novo assembly is really fragmented, N50 is around 5,000, with the top contig length ~110K. The species genome size is around 1.4G.

There is a distantly related species (>160 million years) with annotated genome, so I want to use it to identify homologous genes on my de novo assembly. I thought about two ways:

I can get the ensemble protein sequences from the distantly-related species, and use TBLASTN to align it to my genome, but my genome sequences is really fragmented, so one gene might have different parts on different contigs, and may be missing parts. I don't know how to solve.

Or I can try to align my assembled contigs to the genome, kind of putting those contigs in place, and then get the genes, but I am not very familiar with what tools should be used for genome alignment between these distantly related species.

Any thoughts?

Thank you!



alignment next-gen gene genome • 1.2k views
ADD COMMENTlink modified 4.6 years ago by Biostar ♦♦ 20 • written 4.7 years ago by celesty0

I guess they are too distantly related for a contig to genome alignment. I would go for mapping annotated peptides to your contigs. I have used Spaln for this and worked great, by my pair of species were more closely related.

ADD REPLYlink written 4.4 years ago by h.mon29k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 865 users visited in the last hour