Hi!
I got my .vcf files after doing the variant calling with GATK Haplotype caller.
I am new to PLINK, I would like to know how to get a set of PLINK files (.ped, .map) from the vcf file for somatic cells. So far I used the following:
plink --vcf file.vcf --recode --out PLINKfile
But then in the .ped file I have information only about one of the alleles:
person1  person1  0  0  0  -9  G  A  G  C  C  C ...
person2  person2  0  0  0  -9  G  A  G  C  T  C ...
person3  person3  0  0  0  -9  G  T  C  C  C  C ...
As I understand, for SNPs it should have 2 letters at each position, one for each allele, so it should look like this:
person1  person1  0  0  0  -9  GA  AA  GG  CT  CC  CC ...
person2  person2  0  0  0  -9  GA  AA  GG  CC  TC  CC ...
person3  person3  0  0  0  -9  GA  TA  CG  CT  CC  CC ...
How do I do that? Also, is there a way to encode deletions and insertions, especially if they are longer than 1 nucleotide?
Thank you
ped/map is a very outdated and generally poor and memory inefficient way to store data. Would reccomend using another format if you can.