Entering edit mode
9.0 years ago
joaofadista
▴
50
Hi,
I would like to know how can I handle this query to a vcf file: I have 2 siblings for each 500 mothers (1500 samples in total). They are all genome-wide genotyped and stored in a vcf file. What I want is to create 3 vcf files from the original one:
- One where the genotypes for each variant in the siblings are kept only if their mother is homozygous reference. The sites where their mother is not homozygous reference make it as missing genotype in the respective siblings (vcf1)
- One where the genotypes for each variant in the siblings are kept only if their mother is homozygous alternative. The sites where their mother is not homozygous alternative make it as missing genotype in the respective siblings (vcf2)
- One where the genotypes for each variant in the siblings are kept only if their mother is heterozygous. The sites where their mother is not heterozygous make it as missing genotype in the respective siblings (vcf3)
Example of desired vcf1:
Sa1 Sa2 Ma Sb1 Sb2 Mb
rs001 0/1 0/1 0/0 ./. ./ 0/1
rs002 ./. ./. 0/1 0/0 0/1 0/0
S
= sibling
M
= mother
Thanks in advance,
João
All 1.5k samples are in one file?
If all your samples are in the same vcf file, you might need to write your own script for such job.
What do you want to achieve by separating the files? Maybe there are better ways to do what you want to do without generating all those files?
Thanks for the replies. Yes, all the 1.5k samples are in one file vcf file. I will try to write a script then...
I will explain my problem. I have for each of the 500 mothers, 2 siblings one affected and one not affected by a disease trait (they might be full or only half sibs). Since I have genome-wide data on both siblings and respective mothers, I would like to find mother-child genotype pairs that affect the risk of a child developing this disease. Initially I thought to do a GWAS on the siblings (correcting for relatedness), stratified by the genotype of the mothers. Do you recommend a better way to do this?
Thanks in advance.
if you have the pedigree file, you can try to use plinkseq or kggseq to identify denovo mutations in your disease samples. A good direction to look at might be tools designed to detect somatic mutations in cancer. The goal should be similar, just the sample is slightly different.