Question: vcf filtering and query
0
gravatar for joaofadista
4.7 years ago by
joaofadista50
Denmark
joaofadista50 wrote:

Hi,

I would like to know how can I handle this query to a vcf file: I have 2 siblings for each 500 mothers (1500 samples in total). They are all genome-wide genotyped and stored in a vcf file. What I want is to create 3 vcf files from the original one:

  • One where the genotypes for each variant in the siblings are kept only if their mother is homozygous reference. The sites where their mother is not homozygous reference make it as missing genotype in the respective siblings (vcf1)
  • One where the genotypes for each variant in the siblings are kept only if their mother is homozygous alternative. The sites where their mother is not homozygous alternative make it as missing genotype in the respective siblings (vcf2)
  • One where the genotypes for each variant in the siblings are kept only if their mother is heterozygous. The sites where their mother is not heterozygous make it as missing genotype in the respective siblings (vcf3)

 

Example of desired vcf1:

 

 

Sa1

Sa2

Ma

Sb1

Sb2

Mb

rs001

0/1

0/1

0/0

./.

./

0/1

rs002

./.

./.

0/1

0/0

0/1

0/0

 

S = sibling

M = mother

 

 

Thanks in advance,

João

query filter vcf • 1.6k views
ADD COMMENTlink modified 15 months ago by Biostar ♦♦ 20 • written 4.7 years ago by joaofadista50

All 1.5k samples are in one file?

ADD REPLYlink written 4.7 years ago by Dan520

If all your samples are in the same vcf file, you might need to write your own script for such job.

What do you want to achieve by separating the files? Maybe there are better ways to do what you want to do without generating all those files?

ADD REPLYlink written 4.7 years ago by Sam3.0k

Thanks for the replies. Yes, all the 1.5k samples are in one file vcf file. I will try to write a script then...

I will explain my problem. I have for each of the 500 mothers, 2 siblings one affected and one not affected by a disease trait (they might be full or only half sibs). Since I have genome-wide data on both siblings and respective mothers, I would like to find mother-child genotype pairs that affect the risk of a child developing this disease. Initially I thought to do a GWAS on the siblings (correcting for relatedness), stratified by the genotype of the mothers. Do you recommend a better way to do this?

Thanks in advance.

ADD REPLYlink modified 8 months ago by RamRS28k • written 4.7 years ago by joaofadista50

if you have the pedigree file, you can try to use plinkseq or kggseq to identify denovo mutations in your disease samples. A good direction to look at might be tools designed to detect somatic mutations in cancer. The goal should be similar, just the sample is slightly different.

ADD REPLYlink modified 8 months ago by RamRS28k • written 4.7 years ago by Sam3.0k
0
gravatar for vassialk
4.7 years ago by
vassialk190
Belarus
vassialk190 wrote:

Nextgene software is easy for VCFs and reports or try VCFMiner and VCFtools, Ugene has a better viewer though many other functions are weak.

ADD COMMENTlink written 4.7 years ago by vassialk190
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1569 users visited in the last hour