Question: What's the difference between mpileup output and bcftools call ?
gravatar for sacha
2.5 years ago by
sacha810 wrote:

I guess it's a simple question.. Could you detail what's the difference between vcf file generated by samtools and from bcftools ?

samtools mpileup - ref.fa file.bam > file.bcf

bcftools call file.vcf > file2.bcf


pipeline bam vcf • 2.2k views
ADD COMMENTlink modified 2.5 years ago by Evgeniia Golovina760 • written 2.5 years ago by sacha810
gravatar for Evgeniia Golovina
2.5 years ago by
Evgeniia Golovina760 wrote:

Usually these command are used together:

samtools mpileup command automatically scans every position supported by an aligned read, computes all the possible genotypes supported by raw reads, and then calculates the probability that each of these genotypes is truly present in your sample.

For example, let’s consider the first 1000 bases in Reference Genome file. Suppose the position 35 (in reference G) will have 27 reads with a G base and two reads with a T nucleotide. Total read depth will be 29. In this case, the app concludes with high probability that the sample has a genotype of G, and the T reads are likely due to sequencing errors. In contrast, if the position 400 in reference genome is T, but it is covered by 2 reads with a C base and 66 reads with a G (total read depth equal to 68), it means that the sample more likely will have G genotype.

bcftools call command uses the genotype likelihoods generated from samtools mpileup to call genetic variants and outputs the all identified variants.

So, it means, that file.bcf will contain all possible genotypes in the genome, but the bcftools bcf file will contain only sites which were found to be variant.

If you are interested in specific sites that were not called by bcftools, you can break it down into two separate steps.

Do you want to see example vcf files from both commands?

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by Evgeniia Golovina760

Thanks for your enlightment ! 

Could you edit your post , I think you made a mistake : "Total read depth will be 26" Should be "29" ! Or I may be wrong ! 

So, in brief mpileup compute frequency for each bases ( homozygote/heterozygotes/error)  And vcftools is used to filters and get  only interesting variant. Then, bcftools should have a threashold  parameter  ? 

If you have a small head of both vcf file, you can publish it here. It will be useful for me and for other people. 

ADD REPLYlink written 2.5 years ago by sacha810

Yes, 29 is correct. Thanks.

ADD REPLYlink written 2.5 years ago by Evgeniia Golovina760
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 827 users visited in the last hour