Since bacterial variant calling files (.vcf) do not have GT entries, I got stuck while using PLINK.
What options are available to still perform a PCA on this prokaryotic data? (pipeline? tutorial?) By the way, the .fastq sequencing file is obtained from a mixed culture.
My data specification: I have mapped raw reads obtained from evolved E. coli genome (.fastq file) to my reference genome. I did this with a few different samples. I want to see the how different or similar the samples are, respective their variants (.vcf). Thus, I exported the .vcf, which looks something like the header example shown below (total >50 lines):
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Variants:_LGC19-XL01_S63_L001_R_001_(trimmed) Variants:_Trimmed_LGC19-VP03_S3_L001_R_001 CFC381_K12_Bw25113 172181 . T G 70.56 . NS=1;RF=0.829;VF=0.171;SB=1;SB50=0.031;SB65=0.15;TYPE=SNP(transversion);CDSCN=196;CDSP=588;CDSPWC=3;CDS=clcACDS;CDNCHG=GGT->GGG;AACHG;PE=None;AVQUAL=22 DP:AO .:. 35:6 CFC381_K12_Bw25113 269662 . T C 100.41 . NS=1;RF=0.774;VF=0.226;SB=0.81;SB50=0.0072;SB65=0.18;TYPE=SNP(transition);CDSCN=369;CDSP=1105;CDSPWC=1;CDS=ykfCCDS;CDNCHG=TTA->CTA;AACHG;PE=None;AVQUAL=14 DP:AO .:. 93:21
Now I have merged many of these .vcf (by bcftools).