Question: How to add trio information to VCF or PED format (to compute Mendel error in Plink)
0
gravatar for olavur
23 months ago by
olavur70
Tórshavn, Faroe Islands
olavur70 wrote:

I have some VCF files, and a spreadsheet with trio information (child, father and mother IDs). I want to convert to Plink format (.ped, I can do this using VCFtools), and compute Mendel error (in Plink) using these trios.

How can I do this? The way I understand it, the .ped file should have fields for the family, individual, father and mother IDs, but do not know how to do this.

vcftools snp plink • 2.0k views
ADD COMMENTlink modified 23 months ago • written 23 months ago by olavur70
2
gravatar for Len Trigg
23 months ago by
Len Trigg1.2k
New Zealand
Len Trigg1.2k wrote:

If you don't mind using another tool to identify the mendelian inconsistencies, you can use the rtg mendelian command from RTG Tools to work directly from VCF. It expects the trio information to either embedded inside the VCF header using the standard PEDIGREE field, e.g:

NA12878-trio.vcf

or supplied as a PED pedigree file, e.g:

NA12878-trio.ped

This latter example demonstrates how to encode the pedigree fields you were asking about.

ADD COMMENTlink modified 23 months ago • written 23 months ago by Len Trigg1.2k
2
gravatar for GabrielMontenegro
23 months ago by
United Kingdom
GabrielMontenegro500 wrote:

The PED file is a white-space (space or tab) delimited file: the first six columns are mandatory:

 Family ID
 Individual ID
 Paternal ID
 Maternal ID
 Sex (1=male; 2=female; other=unknown)
 Phenotype

If your number of trios is small, you can do it manually, otherwise just make a script to change this. But what you will need would be to add trio information to the PED file:

Add the corresponding Individual ID of the mother and father to the Paternal ID and Maternal ID in the row of their offspring keeping the same Family ID for the 3 of them.

ADD COMMENTlink written 23 months ago by GabrielMontenegro500
2
gravatar for olavur
23 months ago by
olavur70
Tórshavn, Faroe Islands
olavur70 wrote:

The answers on this post helped a bit, but I figured it out myself. I found the process somewhat confusing, and the documentation slightly lacking, so I will explain my solution in some detail.

I have some VCF files, and a separate file that describes child, mother and father relationships, including the sexes of each individual. To add all this information, I basically need to use the --update-parents and --update-sex methods in plink. These methods accept a file format that is not entirely 100% clear, so I'll explain it shortly.

In my case, the individuals didn't have family and within-family IDs that lend itself to this process. So first I had to change the family ID and the individual ID (also called the within-family ID), such that the ID of an individual is equal to FID + IID. To do that I call plink --file data --update-ids update_ids.txt --recode --out data_updated_ids.

I shall explain how I made the update_ids.txt file, mentioned above, to fix my formatting problems. Say we have an individual with the ID ABC01, ABC being the family ID, and 01 being the within-family ID. My PED file said both the FID and IID were ABC01. To fix this problem, update_ids.txt has to contain the row:

  ABC01    ABC01    ABC    01

where the columns are the original family ID, the original within-family ID, the new family ID, and the new within-family ID.

This should explain the format that the --update-parents and --update-sex methods use. These methods are also described here: https://www.cog-genomics.org/plink2/data#update_indiv

ADD COMMENTlink written 23 months ago by olavur70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1183 users visited in the last hour