Let's say I have a data set of 1000 genes gathered in a fasta file. Is there a way to blast them on de novo assembled and unannotated contigs and extract only those which are found complete (i.e. full length) and with 2 copies ?
The problem with blast is that I only got hits which do not often correspond to a full-length query gene due to more variable parts. As a result, subject genes are split in hits. As I want to see which of my 1000 genes are present in two copies in my contigs, I can not check by eye each of the 1000 blast tables to sum the query cover for each subject gene ID and guess the number of copies. Besides, I want to extract those complete genes and the blast action only return a table of hits.
Thanks for your suggestions!