How to determine phylogenetic family, affinity and similarity from a given set of bacterial short-reads and contigs? I have a set of tuberculosis whole genome short-reads and contigs and want to determine their phylogenetic tuberculosis family. Thx.
You're not going to want to do it with reads. Either contigs or assembly level is more appropriate. BLAST is an old stand-by and it may not always be the best or sexiest technique, but it isn't inappropriate either. The database you chose to search against will impact your answer though, for instance if you look only at a database of TB genomes versus a more general database. You may miss potential LGT events from other bacteria if you only look and compare to other TB genomes.
There are at least a few evolutionary placement algorithms out there for working with reads, est sequences, contigs, etc (RAxML has one I believe), but they may be designed and optimized for broader questions because they are geared towards placing sequences within say a Phylum and not at a strain level. Presumably the majority of your sequences and contigs will be either identical across strains, or highly similar which can make things difficult unless you go to the whole genome comparison route.