Question: Vcf Format Genotyping Selection For Multisamples
1
gravatar for michealsmith
6.7 years ago by
michealsmith740
michealsmith740 wrote:

For vcf file including information for multiple samples like below:

#CHROM POS     ID        REF ALT    QUAL FILTER INFO                              FORMAT      NA00001        NA00002        NA00003
20     14370   rs6054257 G      A       29   PASS   NS=3;DP=14;AF=0.5;DB;H2           GT:GQ:DP:HQ 0/0:48:1:51,51 1/0:48:8:51,51 1/1:43:5:.

In genetics analysis involving familial pedigree, usually we would like to compare the genotyping among different samples (parent vs child). For example, now I wanna select the SNP which appear in all samples, which means the genotyping flag for all the three should be 0/1 or 1/1.

I know it can be done by some bash command (and this is what I'm doing right now); I'm just curious if VCFTOOLS may have any build-in function for such comparison.

thx

vcf vcftools • 3.2k views
ADD COMMENTlink written 6.7 years ago by michealsmith740

I don't think VCFtools (http://vcftools.sourceforge.net/docs.html) has the functionality we need for this (if it does, I can't find it...), which is why most people write their own Perl or Python scripts to filter their data for pedigree analysis at this stage. You could do it in bash as well, I guess, and search for each genotype flag as a regex.

ADD REPLYlink written 6.7 years ago by Alex Paciorkowski3.3k

What is the end goal? Perhaps you want to do more advanced analyses? Do you want to phase the data? Are you looking for a loci that might be disease causing?

ADD REPLYlink modified 6.7 years ago • written 6.7 years ago by Zev.Kronenberg11k

Yeah, Zev, I need to find disease-causing SNP, ie. to find which SNP segregates with disease according to pedigree.

ADD REPLYlink written 6.7 years ago by michealsmith740

If you have the variants called, which it looks like you do, why not let VAAST do the work for you? Our lab developed VAAST and our mailing list is very friendly.

ADD REPLYlink written 6.7 years ago by Zev.Kronenberg11k
1

Zev, I agree VAAST looks like an interesting tool. I can see how it may be helpful in identifying pathogenic variants in multigenic disease models, but how does it improve gene finding in autosomal recessive or dominant models where there is one causative gene? If @gerrybio2010 is looking for one gene, pulling out variants shared/not shared by proband and parents with a script will do the trick. I haven't used VAAST, but am certainly willing to give it a try.

ADD REPLYlink modified 6.7 years ago • written 6.7 years ago by Alex Paciorkowski3.3k

Yes, Binary filtering can do the trick. But what if you are missing data? The binary filter can remove the causal variant if the parents don't have coverage. VAAST takes a probabilistic approach with knowledge of the trio and frequencies of the alleles in a background file like 1K genomes. Secondly VAAST scores how deleterious a mutation is by using blossom tables and OMIM data.

ADD REPLYlink modified 6.7 years ago • written 6.7 years ago by Zev.Kronenberg11k
0
gravatar for thamathpanda
6.6 years ago by
thamathpanda40
science
thamathpanda40 wrote:

Bro.......... RTFM

from VCFtools

About:

Merges VCF files by position, creating multi-sample VCFs from fewer-sample VCFs. The tool requires bgzipped and tabix indexed VCF files on input. (E.g. bgzip file.vcf; tabix -p vcf file.vcf.gz) If you need to concatenate VCFs (e.g. files split by chromosome), look at vcf-concat instead. Usage: vcf-merge [OPTIONS] file1.vcf file2.vcf.gz ... > out.vcf

ADD COMMENTlink written 6.6 years ago by thamathpanda40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1404 users visited in the last hour