OK, I have several hundred fragments of a protein of interest(699 sequences) that I would like to align and make a neighbor joining tree of. These fragments do not in many cases align well to one another (different regions of the same or similar proteins).
However, whole protein sequence(s) have been defined and submitted to NCBI and other databases etc. There are also trees made in literature for this protein. Is there a way to take my fragments from my metagenome, and align them to the known sequences to define where each of my fragments lie on the published tree? my only solution to this is to run each sequence (or cluster of sequences) on the predefined tree (using the original whole protein sequences from publication) so as to define where each fragment would lie.
My sequences are non assembly sequences (can't assemble them, too diverse)
Average read length is 400bp
General protein length is around 350aa
IS there an easier way to do this?
How accurate would diversity statistics be on this protein? (will not be adding the known protein sequence for this one)
Thanks for any advice/help in advance.