Use bcftools to find carriers of variants
2
0
Entering edit mode
2.2 years ago

I would think this is a frequent task, but it seems I cannot find a bcftools-approach to tackle this: when looking at large VCF files with (rare) variants, I often want to figure out the sample identifier of the carrier(s). I think one solution could be to add a field to the INFO field "CARRIERS=<list>", or alternatively make a tsv export of the VCF.

I now do this in python with cyvcf2, works great, but it feels like an unnecessary wrapper.

Any bcftools wizards? :)

vcf bcftools • 1.2k views
ADD COMMENT
3
Entering edit mode
2.2 years ago

bcftools query should help here, e.g.:

bcftools query -i'GT="het"' -f '%CHROM  %POS  [%SAMPLE ]\n' variants.bcf
ADD COMMENT
1
Entering edit mode
2.2 years ago

using vcffilterjdk ( http://lindenb.github.io/jvarkit/VcfFilterJdk.html à , the following cmd will add the INFO/SAMPLES containing the name of the HET or HOM_VAR samples.

bcftools view input.vcf.gz |\
awk '/^#CHROM/ {printf("##INFO=<ID=SAMPLES,Number=.,Type=String>\n");} {print;}' |\
java -jar dist/vcffilterjdk.jar -e 'final List<String> L= variant.getGenotypes().stream().filter(G->G.isHet() || G.isHomVar()).map(G->G.getSampleName()).collect(Collectors.toList()); return new VariantContextBuilder(variant).attribute("SAMPLES",L).make();'
ADD COMMENT

Login before adding your answer.

Traffic: 1799 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6