(Warning: newbie question.)
The problem: we have a reference strain (diploid) and three experimental strains derived from it. We have the FASTA sequence of the reference, with both homologs for each chromosome. We also sequenced the three experimental strains (in FASTQ format). We need to know the difference between the experimental strain and the reference strain - what genetic changes did they undergo (if any)?
As I see it, the straightforward way would be to get a list of all SNPs in the reference strain (as a VCF file), then perform SNP calls on each strain, then filter out the reference SNPs from the strain SNPs. I figured out how to do the SNP calls on the strains and I found tools that can filter a VCF against another VCF, but I don't know how to perform the first step - getting a list of SNPs from the reference FASTA. How can this be done? Alternatively, what are other standard ways of discovering mutations?
(There was also a suggestion of comparing the VCFs of the three strains against each other, but that sounds inelegant. What if we only had one derived strain?)
Thanks in advance!