I have an input set of variants in VCF format (input.vcf.gz), which includes indels and SNPs. Multi-allelic variants are split across multiple lines. Using bcftools view input.vcf.gz | vcf2bed | bedtools merge I then created a bed file targets.bed.
I would like to call only this predetermined list of variants using samtools and/or bcftools (or other tools). How would I go about doing this?
I suppose I could specify the targets.bed file as a parameter to bcftools mpileup:
bcftools mpileup -R targets.bed --fasta-ref human_g1k_v37_decoy.fasta input.bam
... but that would still not allow me to specifically call the given list of variants in input.vcf.gz. Any suggestions what I could do here?
Thanks ! This does the trick. I'm adding the
-Aparameter tobcftools callto include alts even if not present in the genotypes. Also, interestingly the-v(verbose) flag breaksbcftools calland I had to remove it to get anything to output. Not sure why. Unfortunately I ran into this issue: https://github.com/samtools/bcftools/issues/887 when trying to addADtobcftools mpileupwhich luckily was fixed but requires a github developmental branch installation to fix.P.s. it would make things a bit easier if
bcftools call -Tallowed a VCF as input rather than the specified targets format it currently requires. I wonder why this is? It's not very intuitive and though I had read this part of the bcftools manual I somehow had discarded it as unlikely to work for indels (even though it does seem to now after testing).BTW, I don't clearly understand the difference between
bcftools mpileup -T file.bedandbcftools mpileup -R file.bed. For a small testcase,-Rseems to be significantly faster.Hello floris.barthel ,
it was my fault with the
-v.-vinbcftools callis for printing variant sites only and this is not what we intend to do when using-C. So this two option are in conflict.Also using
-Rfor specifying the regions during mpileup is a better choice than-T. If using-Rbcftools can directly jump to the position in the file. If using-Tit must iterate over the whole file to find the regions listed.I edit my answer now :)
fin swimmer