Question: GATK: .ped vs .fam and missing values
2.2 years ago by
rmf940 wrote:

In the GATK pipeline, it seems like I need to use a .ped file for CalculateGenotypePosteriors and a .fam file for VariantsToBinaryPed. What is the difference between .ped and .fam file? And how do I specify missing parents in both of them? Some options I've seen are zero (0), NO_PARENTS, -9 etc. I want to be sure about this because I don't want the tool to think that 0, NO_PARENTS etc is the character describing the parent.

My file looks like this now:

#family_id      individual_id   paternal_id     maternal_id     sex     phenotype 
20  20-01  m20  f20  1  1
20  20-02  m20  f20  1  1
20  20-03  m20  f20  1  1
21  21-01  m21  f21  1  1
21  21-02  m21  f21  1  1
21  21-03  m21  f21  1  1
20  m20              1  0
20  f20              2  0
21  m21              1  0
21  f21              2  0
You can add 0 for missing parents. Check link for more information.

2.2 years ago by
Kevin Blighe60k
Kevin Blighe60k wrote:

The PED and FAM file formats come from the eminent program PLINK.

For a description on PED fies, including information on how to encode missing values, please go here: PED files (note that the binary version of a PED file is called BED)

For a description on FAM files, see here: .fam (PLINK sample information file)


