Question: GATK: .ped vs .fam and missing values
gravatar for rmf
2.2 years ago by
rmf940 wrote:

In the GATK pipeline, it seems like I need to use a .ped file for CalculateGenotypePosteriors and a .fam file for VariantsToBinaryPed. What is the difference between .ped and .fam file? And how do I specify missing parents in both of them? Some options I've seen are zero (0), NO_PARENTS, -9 etc. I want to be sure about this because I don't want the tool to think that 0, NO_PARENTS etc is the character describing the parent.

My file looks like this now:

#family_id      individual_id   paternal_id     maternal_id     sex     phenotype 
20  20-01  m20  f20  1  1
20  20-02  m20  f20  1  1
20  20-03  m20  f20  1  1
21  21-01  m21  f21  1  1
21  21-02  m21  f21  1  1
21  21-03  m21  f21  1  1
20  m20              1  0
20  f20              2  0
21  m21              1  0
21  f21              2  0
snp gatk variant-calling • 798 views
ADD COMMENTlink modified 2.2 years ago by Kevin Blighe60k • written 2.2 years ago by rmf940

You can add 0 for missing parents. Check link for more information.

ADD REPLYlink written 2.2 years ago by Bioinformatics_NewComer320
gravatar for Kevin Blighe
2.2 years ago by
Kevin Blighe60k
Kevin Blighe60k wrote:

The PED and FAM file formats come from the eminent program PLINK.

For a description on PED fies, including information on how to encode missing values, please go here: PED files (note that the binary version of a PED file is called BED)

For a description on FAM files, see here: .fam (PLINK sample information file)


ADD COMMENTlink written 2.2 years ago by Kevin Blighe60k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 729 users visited in the last hour