How does bcftools decide what sample name to assign when calling variants?
1
0
Entering edit mode
8 months ago
Paul • 0

How does bcftools decide what sample names to assign in the vcf when performing variant calling using mpileup and call commands? I'm using bcftools to call variants from an aligned bam file like this

samtools mpileup -A -d 100000 -Ou aligned.bam | bcftools call -mv -Oz -o variants.vcf.gz

In the output vcf, there is only one sample and it is named "aligned.bam", same as the file name. How does bcftools decide what sample name to assign? Does it look at other information in the sam/bam file? The --read-group option of the mpileup command implies that sample names can be modified based on read group tags. If I were to specify sample names in the header of aligned.bam, would this become the sample name in my vcf by default?

I'll note that in this case aligned.bam has no header, so I have no complaints about what bcftools is doing. But I'd like to know how to control the behavior.

The broader context is that I actually have many alignments, each of a different individual and I'd like to have a single unified VCF with properly named samples when all is said and done.

bcftools variant-calling • 467 views
ADD COMMENT
0
Entering edit mode
8 months ago

bcftools mpileup uses the read groups (and the subfield SM:) that should have been supplied when aligning the reads. https://gatk.broadinstitute.org/hc/en-us/articles/360035890671-Read-groups

ADD COMMENT
0
Entering edit mode

Thanks, I will get to work adding RG and SM tags.

ADD REPLY

Login before adding your answer.

Traffic: 1757 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6