I identified SNPs from about 60 bacterial genomes using GATK and the same reference genome was used throughout. I currently have separate VCF files with SNPs for each bacterial genome. My main goal is to construct a maximum likelihood phylogenetic tree using these SNPs. I am now kind of stuck thinking how to proceed. Based on my understanding, I now have to merge the VCF files (which I did) and now I am not sure how to feed these files to a phylogeny software to construct a tree?
I was thinking to parse the vcf files to extract the reference and alternate nucleotide and generate a fasta sequence that can be used for phylogenetic analysis. But I am not sure if that is the correct method or is there any scripts or software already available to do this?
As I ran the SNP workflow in GATK separately for each genome, I am wondering if I should run the workflow for all the genomes in a single run?
Looking forward for suggestions.