I have whole genome sequencing data of 1605 bacterial samples in paired-end reads fastq format. These samples come from 3 phenotypes. I have also a reference sequence fasta file. I have aligned 1600 samples to the reference genome and then I passed bam files to samtools mpileup:
samtools mpileup -g -f "ref.fa" -o mpileup.bcf -b <list of="" all="" 1605="" bam="" files="">
Then I call variants using this command: bcftools call -v -m --ploidy 1 -O b -f gq,gp -o variant.bcf mpileup.bcf
Now I have a huge bcf file containing 1605 samples and 312924 variant sites. How I can find variants that are specific to each phenotype? There are three phenotypes across 1605 samples.