Question: Missing genotypes, in case control study using Plink
0
gravatar for cristianrohr768
5.5 years ago by
Spain
cristianrohr76830 wrote:

Hello,

I have a sequencing in ION PGM.

We sequenced 96 barcodes (individuals) and 310 amplicons (chromosomal regions).

32 barcodes are controls and 64 are cases.

We did the variant calling and get 96 VCF files.
We combine them in a single VCF file using GATK. We have 90 different SNPs in the sample.

We convert the single VCF file to plink format (map and ped files) using vcftools.

Now we want to use plink to make a association test,

the ped file looks like this

1 1 0 0 0 2 0 0 0 0 0 0 0 0 0 0
2 2 0 0 0 1 0 0 0 0 0 0 0 0 T C
3 3 0 0 0 2 0 0 0 0 0 0 0 0 T C
4 4 0 0 0 2 0 0 C T 0 0 0 0 T C
5 5 0 0 0 1 0 0 0 0 0 0 0 0 0 0
6 6 0 0 0 2 0 0 0 0 0 0 0 0 0 0
7 7 0 0 0 2 0 0 0 0 0 0 0 0 0 0
8 8 0 0 0 1 0 0 0 0 0 0 0 0 T C
9 9 0 0 0 2 0 0 0 0 0 0 0 0 T C
10 10 0 0 0 2 0 0 0 0 0 0 0 0 T C
11 11 0 0 0 1 0 0 0 0 0 0 0 0 0 0
12 12 0 0 0 2 0 0 C T 0 0 0 0 T C


You can see that there are a lot of missing genotypes, i would like to know what's the standar in this case?

assume that the missing genotypes are references? because most of them probably are, and other could be missing data, but we can't know that, only checking the bam file i guess

If assume the Missing as reference, is there any command in plink to add them automatically?

thanks

Cristian

ADD COMMENTlink modified 5.5 years ago by chrchang5235.8k • written 5.5 years ago by cristianrohr76830
0
gravatar for chrchang523
5.5 years ago by
chrchang5235.8k
United States
chrchang5235.8k wrote:

The VCF reference merge documentation describes how to do this for a single genome (it should be pretty straightforward to extend this to all your samples):

https://www.cog-genomics.org/plink2/data#merge_vcf_example

The key step is --merge with --merge-mode 5, which keeps the base genotype if the --merge genotype is missing, and otherwise uses the data in the --merge file.  So make the base fileset contain just reference information (and copy the real FID/IID over the reference FID/IID so they match), and you're golden.

ADD COMMENTlink modified 5.5 years ago • written 5.5 years ago by chrchang5235.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1329 users visited in the last hour