I have a conceptual question. I want to create
fasta files for a genomic region from a multisample
VCF file. I use the
GATK to do this. I am dealing with a highly heterozygous species, so I have quite a few heterozygous SNPs in the VCF file. So, when I generate the
fasta files for each sample from the VCF file, I get a few Letters like K, Y, etc. instead of nucleotide bases.
This means that in those positions, the SNP is heterozygous, is it right?
Also, when making
fasta files from a multisample
VCF, should I use the
VCF file filtered with MAF, genotyping call rate, and other filtering criteria? Or should I use an unfiltered
VCF file for such purposes?