Question: Samtools/bcftools force-calling using an input set of variants
gravatar for floris.barthel
3 months ago by
floris.barthel20 wrote:

I have an input set of variants in VCF format (input.vcf.gz), which includes indels and SNPs. Multi-allelic variants are split across multiple lines. Using bcftools view input.vcf.gz | vcf2bed | bedtools merge I then created a bed file targets.bed.

I would like to call only this predetermined list of variants using samtools and/or bcftools (or other tools). How would I go about doing this?

I suppose I could specify the targets.bed file as a parameter to bcftools mpileup:

bcftools mpileup -R targets.bed --fasta-ref human_g1k_v37_decoy.fasta input.bam

... but that would still not allow me to specifically call the given list of variants in input.vcf.gz. Any suggestions what I could do here?

ADD COMMENTlink modified 3 months ago • written 3 months ago by floris.barthel20
gravatar for finswimmer
3 months ago by
finswimmer9.9k wrote:


haven't test it, but the -C paramter for bcftools call looks like this is what you want to do:

 -C, --constrain alleles|trio

        call genotypes given alleles. See also -T, --targets-file. 
        call genotypes given the father-mother-child constraint. See also -s, --samples and -n, --novel-rate.

Take care of the description how the target file during calling have to look like:

 -T, --targets-file [^]FILE
    Same -t, --targets, but reads regions from a file. Note that -T cannot be used in combination with -t. 
    With the call -C alleles command, third column of the targets file must be comma-separated list of alleles, starting with the reference allele. Note that the file must be compressed and index. Such a file can be easily created from a VCF using: 

    bcftools query -f'%CHROM\t%POS\t%REF,%ALT\n' file.vcf | bgzip -c > als.tsv.gz && tabix -s1 -b2 -e2 als.tsv.gz

So the full command could look like this:

$ bcftools mpileup -Ou -R targets.bed -f genome.fa input.bam | bcftools call -A -C alleles -T als.tsv.gz -m -Ov - > allels.vcf

fin swimmer

ADD COMMENTlink modified 3 months ago • written 3 months ago by finswimmer9.9k

Thanks ! This does the trick. I'm adding the -A parameter to bcftools call to include alts even if not present in the genotypes. Also, interestingly the -v (verbose) flag breaks bcftools call and I had to remove it to get anything to output. Not sure why. Unfortunately I ran into this issue: when trying to add AD to bcftools mpileup which luckily was fixed but requires a github developmental branch installation to fix.

P.s. it would make things a bit easier if bcftools call -T allowed a VCF as input rather than the specified targets format it currently requires. I wonder why this is? It's not very intuitive and though I had read this part of the bcftools manual I somehow had discarded it as unlikely to work for indels (even though it does seem to now after testing).

ADD REPLYlink written 3 months ago by floris.barthel20

BTW, I don't clearly understand the difference between bcftools mpileup -T file.bed and bcftools mpileup -R file.bed. For a small testcase, -R seems to be significantly faster.

ADD REPLYlink written 3 months ago by floris.barthel20

Hello floris.barthel ,

it was my fault with the -v. -v in bcftools call is for printing variant sites only and this is not what we intend to do when using -C. So this two option are in conflict.

Also using -R for specifying the regions during mpileup is a better choice than -T. If using -R bcftools can directly jump to the position in the file. If using -T it must iterate over the whole file to find the regions listed.

I edit my answer now :)

fin swimmer

ADD REPLYlink written 3 months ago by finswimmer9.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1926 users visited in the last hour