We have whole genome resequenced strains (with Illumina) from a non-model yeast that are mapped to a reference genome (sequenced using PacBio), using BWA. From the Mapping files, we used GATK tools to do jointgenotyper variant calling followed by filtering. The type strain that was used for high-depth PacBio sequencing was also used for Illumina and we want to remove the SNPs that are common to this type strain and all the other resequenced strains so that we can ascertain only the SNPs that are unique to the resequenced population. The type strain is not related to the remaining [population, but we feel that the "artcifactual" SNPs from this strain are making the data noisy.
So simply, how do we remove SNP columns that are common to the resequences population and the type strain?
I have tried this with vcftools and plink and so far been unsuccessful.
Thanks in advance.
can we see a sample of your VCF please ? with the expected output.