How to add trio information to VCF or PED format (to compute Mendel error in Plink)
3
1
Entering edit mode
5.2 years ago
olavur ▴ 140

I have some VCF files, and a spreadsheet with trio information (child, father and mother IDs). I want to convert to Plink format (.ped, I can do this using VCFtools), and compute Mendel error (in Plink) using these trios.

How can I do this? The way I understand it, the .ped file should have fields for the family, individual, father and mother IDs, but do not know how to do this.

plink vcftools SNP • 4.8k views
3
Entering edit mode
5.2 years ago
Len Trigg ★ 1.6k

If you don't mind using another tool to identify the mendelian inconsistencies, you can use the rtg mendelian command from RTG Tools to work directly from VCF. It expects the trio information to either embedded inside the VCF header using the standard PEDIGREE field, e.g:

NA12878-trio.vcf

or supplied as a PED pedigree file, e.g:

NA12878-trio.ped

This latter example demonstrates how to encode the pedigree fields you were asking about.

3
Entering edit mode
5.2 years ago

The PED file is a white-space (space or tab) delimited file: the first six columns are mandatory:

 Family ID
Individual ID
Paternal ID
Maternal ID
Sex (1=male; 2=female; other=unknown)
Phenotype


If your number of trios is small, you can do it manually, otherwise just make a script to change this. But what you will need would be to add trio information to the PED file:

Add the corresponding Individual ID of the mother and father to the Paternal ID and Maternal ID in the row of their offspring keeping the same Family ID for the 3 of them.

3
Entering edit mode
5.2 years ago
olavur ▴ 140

The answers on this post helped a bit, but I figured it out myself. I found the process somewhat confusing, and the documentation slightly lacking, so I will explain my solution in some detail.

I have some VCF files, and a separate file that describes child, mother and father relationships, including the sexes of each individual. To add all this information, I basically need to use the --update-parents and --update-sex methods in plink. These methods accept a file format that is not entirely 100% clear, so I'll explain it shortly.

In my case, the individuals didn't have family and within-family IDs that lend itself to this process. So first I had to change the family ID and the individual ID (also called the within-family ID), such that the ID of an individual is equal to FID + IID. To do that I call plink --file data --update-ids update_ids.txt --recode --out data_updated_ids.

I shall explain how I made the update_ids.txt file, mentioned above, to fix my formatting problems. Say we have an individual with the ID ABC01, ABC being the family ID, and 01 being the within-family ID. My PED file said both the FID and IID were ABC01. To fix this problem, update_ids.txt has to contain the row:

  ABC01    ABC01    ABC    01


where the columns are the original family ID, the original within-family ID, the new family ID, and the new within-family ID.

This should explain the format that the --update-parents and --update-sex methods use. These methods are also described here: https://www.cog-genomics.org/plink2/data#update_indiv