Finding unique variants by sample ID in a multisample vcf
Entering edit mode
3.0 years ago
tacrolimus ▴ 140

Dear Biostars community,

I have 70 gvcfs for patients with a rare disease. There are three regions I am interested in and I have merged these vcfs into a multi-sample vcf by these three regions and annotated the file using VEP. Two of the regions are known to be associated with the disease in question and the third region is under investigation.

I want to pick out individuals who ONLY have variants in the region of investigation i.e. if a patient has a variant in either one of the two other "known" regions then they are excluded. The desired output would be a list of sample IDs that I could then subset the VCF into to get a vcf of variants in those people who only have variants in the region of interest.

I'm not sure how to approach this and was wondering if a bcftools method would be helpful? Bcftools isec seems to be variant based rather than ID based....

Many thanks for your help

bcftools vcf SNP vep • 1.4k views
Entering edit mode
3.0 years ago
brunobsouzaa ▴ 810

If you have a gVCF file with genotypes as 1/1, 0/1, 0/0 or ./., you could first use vcftools --positions to select only your regions of interest. and then, use vcftools again to filter out individuals that are 0/0 and ./.

Hope that helps!

Entering edit mode

Thanks - I think I can do this in bcftools which I am more comfortable with but I did not think of this simple approach to the problem.

All the best


Login before adding your answer.

Traffic: 1232 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6