Question: Samtools/bcftools force-calling using an input set of variants
0
gravatar for floris.barthel
9 months ago by
floris.barthel40 wrote:

I have an input set of variants in VCF format (input.vcf.gz), which includes indels and SNPs. Multi-allelic variants are split across multiple lines. Using bcftools view input.vcf.gz | vcf2bed | bedtools merge I then created a bed file targets.bed.

I would like to call only this predetermined list of variants using samtools and/or bcftools (or other tools). How would I go about doing this?

I suppose I could specify the targets.bed file as a parameter to bcftools mpileup:

bcftools mpileup -R targets.bed --fasta-ref human_g1k_v37_decoy.fasta input.bam

... but that would still not allow me to specifically call the given list of variants in input.vcf.gz. Any suggestions what I could do here?

ADD COMMENTlink modified 9 months ago • written 9 months ago by floris.barthel40
3
gravatar for finswimmer
9 months ago by
finswimmer12k
Germany
finswimmer12k wrote:

Hello,

haven't test it, but the -C paramter for bcftools call looks like this is what you want to do:

 -C, --constrain alleles|trio

    alleles
        call genotypes given alleles. See also -T, --targets-file. 
    trio
        call genotypes given the father-mother-child constraint. See also -s, --samples and -n, --novel-rate.

Take care of the description how the target file during calling have to look like:

 -T, --targets-file [^]FILE
    Same -t, --targets, but reads regions from a file. Note that -T cannot be used in combination with -t. 
    With the call -C alleles command, third column of the targets file must be comma-separated list of alleles, starting with the reference allele. Note that the file must be compressed and index. Such a file can be easily created from a VCF using: 

    bcftools query -f'%CHROM\t%POS\t%REF,%ALT\n' file.vcf | bgzip -c > als.tsv.gz && tabix -s1 -b2 -e2 als.tsv.gz

So the full command could look like this:

$ bcftools mpileup -Ou -R targets.bed -f genome.fa input.bam | bcftools call -A -C alleles -T als.tsv.gz -m -Ov - > allels.vcf

fin swimmer

ADD COMMENTlink modified 9 months ago • written 9 months ago by finswimmer12k
1

Thanks ! This does the trick. I'm adding the -A parameter to bcftools call to include alts even if not present in the genotypes. Also, interestingly the -v (verbose) flag breaks bcftools call and I had to remove it to get anything to output. Not sure why. Unfortunately I ran into this issue: https://github.com/samtools/bcftools/issues/887 when trying to add AD to bcftools mpileup which luckily was fixed but requires a github developmental branch installation to fix.

P.s. it would make things a bit easier if bcftools call -T allowed a VCF as input rather than the specified targets format it currently requires. I wonder why this is? It's not very intuitive and though I had read this part of the bcftools manual I somehow had discarded it as unlikely to work for indels (even though it does seem to now after testing).

ADD REPLYlink written 9 months ago by floris.barthel40

BTW, I don't clearly understand the difference between bcftools mpileup -T file.bed and bcftools mpileup -R file.bed. For a small testcase, -R seems to be significantly faster.

ADD REPLYlink written 9 months ago by floris.barthel40
2

Hello floris.barthel ,

it was my fault with the -v. -v in bcftools call is for printing variant sites only and this is not what we intend to do when using -C. So this two option are in conflict.

Also using -R for specifying the regions during mpileup is a better choice than -T. If using -R bcftools can directly jump to the position in the file. If using -T it must iterate over the whole file to find the regions listed.

I edit my answer now :)

fin swimmer

ADD REPLYlink written 9 months ago by finswimmer12k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1070 users visited in the last hour