I have a VCF file containing SNPs called from a trio (two parents and one child). I was wondering what format should I follow for the PED file to input into GATK PhaseByTransmission?
The GATK/PLINK forums list the following as essential columns:
Family ID
Individual ID
Paternal ID
Maternal ID
Sex (1=male; 2=female; other=unknown)
Phenotype
So I created a simple PED file (input.ped) as follows:
F1      P      0       0       1       1
F1      M      0       0       2       1
F1      H1a    P       M       1       1
F1      H1b    P       M       1       1
Do I need to follow any convention when naming my samples in my input.vcf when I run the following:
java -Xmx2g -jar GenomeAnalysisTK.jar \
   -R ref.fasta \
   -T PhaseByTransmission \
   -V input.vcf \
   -ped input.ped \
   -o output.vcf
The question is about a child and two parents??