Samtools/bcftools force-calling using an input set of variants
1
0
Entering edit mode
3.2 years ago

I have an input set of variants in VCF format (input.vcf.gz), which includes indels and SNPs. Multi-allelic variants are split across multiple lines. Using bcftools view input.vcf.gz | vcf2bed | bedtools merge I then created a bed file targets.bed.

I would like to call only this predetermined list of variants using samtools and/or bcftools (or other tools). How would I go about doing this?

I suppose I could specify the targets.bed file as a parameter to bcftools mpileup:

bcftools mpileup -R targets.bed --fasta-ref human_g1k_v37_decoy.fasta input.bam


... but that would still not allow me to specifically call the given list of variants in input.vcf.gz. Any suggestions what I could do here?

bcftools samtools variant calling vcf • 2.5k views
5
Entering edit mode
3.2 years ago

Hello,

haven't test it, but the -C paramter for bcftools call looks like this is what you want to do:

 -C, --constrain alleles|trio

alleles
trio
call genotypes given the father-mother-child constraint. See also -s, --samples and -n, --novel-rate.


Take care of the description how the target file during calling have to look like:

 -T, --targets-file [^]FILE
Same -t, --targets, but reads regions from a file. Note that -T cannot be used in combination with -t.
With the call -C alleles command, third column of the targets file must be comma-separated list of alleles, starting with the reference allele. Note that the file must be compressed and index. Such a file can be easily created from a VCF using:

bcftools query -f'%CHROM\t%POS\t%REF,%ALT\n' file.vcf | bgzip -c > als.tsv.gz && tabix -s1 -b2 -e2 als.tsv.gz


So the full command could look like this:

\$ bcftools mpileup -Ou -R targets.bed -f genome.fa input.bam | bcftools call -A -C alleles -T als.tsv.gz -m -Ov - > allels.vcf


fin swimmer

1
Entering edit mode

Thanks ! This does the trick. I'm adding the -A parameter to bcftools call to include alts even if not present in the genotypes. Also, interestingly the -v (verbose) flag breaks bcftools call and I had to remove it to get anything to output. Not sure why. Unfortunately I ran into this issue: https://github.com/samtools/bcftools/issues/887 when trying to add AD to bcftools mpileup which luckily was fixed but requires a github developmental branch installation to fix.

P.s. it would make things a bit easier if bcftools call -T allowed a VCF as input rather than the specified targets format it currently requires. I wonder why this is? It's not very intuitive and though I had read this part of the bcftools manual I somehow had discarded it as unlikely to work for indels (even though it does seem to now after testing).

0
Entering edit mode

BTW, I don't clearly understand the difference between bcftools mpileup -T file.bed and bcftools mpileup -R file.bed. For a small testcase, -R seems to be significantly faster.

2
Entering edit mode

Hello floris.barthel ,

it was my fault with the -v. -v in bcftools call is for printing variant sites only and this is not what we intend to do when using -C. So this two option are in conflict.

Also using -R for specifying the regions during mpileup is a better choice than -T. If using -R bcftools can directly jump to the position in the file. If using -T it must iterate over the whole file to find the regions listed.

I edit my answer now :)

fin swimmer