Could someone please let me know how one makes the best informed decision on choosing a reference genome to assemble a novel bacterial strain in the real "world of bioinformatics?
Is it appropriate to assemble raw sequence data into contigs, then' blastn' one of the larger contigs to find a similar strain and attempt reference genome assembly with that 'match'?
Is it then informative to find the ORFs with Glimmer3, or will the assembled consensus sequence be actually uninformative as it will contain parts of the reference genome?
What about the 'un-assembled contigs that are left? What do people usually do with those? Chuck them in the recycling or try and find some annotation for those?
Could I also ask if people mostly run Glimmer3 on the finished consensus sequence or on the contigs assembled from the raw seq reads?