Question: PED file format for use with GATK PhaseByTransmission
gravatar for stefano.iantorno
4.5 years ago by
United Kingdom
stefano.iantorno70 wrote:

I have a VCF file containing SNPs called from a trio (two parents and one child). I was wondering what format should I follow for the PED file to input into GATK PhaseByTransmission?

The GATK/PLINK forums list the following as essential columns:

Family ID
Individual ID
Paternal ID
Maternal ID
Sex (1=male; 2=female; other=unknown)


So I created a simple PED file (input.ped) as follows:

F1      P      0       0       1       1
F1      M     0       0       2       1
F1      H1a     P      M     1       1
F1      H1b     P      M     1       1


Do I need to follow any convention when naming my samples in my input.vcf when I run the following:

java -Xmx2g -jar GenomeAnalysisTK.jar \
   -R ref.fasta \
   -T PhaseByTransmission \
   -V input.vcf \
   -ped input.ped \
   -o output.vcf
sequencing snp next-gen • 3.0k views
ADD COMMENTlink modified 3.7 years ago by Len Trigg1.3k • written 4.5 years ago by stefano.iantorno70
gravatar for ebrown1955
3.7 years ago by
United States
ebrown1955300 wrote:

Unfortunately PhaseByTransmission will only work on trios. What you'll have to do is run the PBT twice, one "trio" for each child.

If I'm not mistaken, Beagle 4.0 currently accepts ped files with multiple trios and will output a VCF file with all phased genotypes in one pass.

ADD COMMENTlink written 3.7 years ago by ebrown1955300

The question is about a child and two parents??

ADD REPLYlink written 13 months ago by SmallChess490
gravatar for Len Trigg
3.7 years ago by
Len Trigg1.3k
New Zealand
Len Trigg1.3k wrote:

Pedigree-aware variant calling is one of the strengths of the Real Time Genomics commands available as part of RTG Core.

You can run simple pedigree-based phasing by transmission on an existing VCF call set using an expert option of the rtg mendelian tool, e.g:

rtg mendelian -t ref.sdf --pedigree input.ped --input input.vcf --output output.vcf --Xphase

Which will phase all offspring calls where possible.

If you have the option to re-run the variant calling itself, you can use rtg family (or rtg population if you have mixtures of families, multi-generation pedigree, and unrelated samples), which will perform pedigree-aware variant calling. The benefit is that the pedigree actually informs the Bayesian variant calling itself, you automatically get pedigree-phased calls in the output, and marking of de-novo variants.


ADD COMMENTlink written 3.7 years ago by Len Trigg1.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1064 users visited in the last hour