Question: bcftools mpileup output format
gravatar for tarek.mohamed
2.2 years ago by
tarek.mohamed270 wrote:

Hi All,

I am trying to use modify bcftools mpileup / call to detect snps from nanopore sequence data with huger depth ( >5000X).

I need to understand the format of the mpileup shown bellow. I used position 84303804 as an example

> bcftools mpileup --skip-indels  -C 0 -d 250 -r chr9:84273123-84368634 --threads 24 --output-type v -f NCBI.GRCh38.fa --output BC01.bcftooslmpileup_noindels.vcf BC01.bam

> cat BC01.bcftooslmpileup_noindels.vcf | grep 84303804
chr9    84303804    .   C   T,A,<*> 0   .   DP=476;I16=73,12,0,32,1524,28372,525,8835,5090,304900,1920,115200,2125,53125,800,20000;QS=0.743777,0.243533,0.0126891,0;VDB=0.00960702;SGB=-0.69312;RPB=0.345296;MQB=0.995226;MQSB=0.992367;BQB=0.134655;MQ0F=0 PL  0,31,127,226,196,166,255,217,172,166
mpileup bcftools • 4.3k views
ADD COMMENTlink written 2.2 years ago by tarek.mohamed270


bcftools mpileup calculates the genotype likelihoods for each position. It is not doing variant calling! What you see is the position, the REF and observed ALT alleles, some mpileup specific information which are used later in variant calling (for there description look into the header of your output) and the likelihoods for each possible genotype.

To make variant calling, you have to pass the output of mpileup to bcftools call.

$ bcftools mpileup --skip-indels  -C 0 -d 250 -r chr9:84273123-84368634 --threads 24 -Ou -f NCBI.GRCh38.fa BC01.bam| bcftools call -mv -Ov -o BC01.bcftooslmpileup_noindels.vcf

fin swimmer

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by finswimmer14k

Hi Thanks for the reply I know about bcftools call, but I want to understand all of mpileup output because as you mentioned this will be used by bcftools call. By modifying bcftools mpileup parameters, hopefully I can call variants form the long reads sequence I have. My bam file pile (with huge depth of coverage) clearly shows the variant positions. If I ran bcftools mpileup | call using default parameters, it Ignore calling some clear variants.

For example, there are 10 numbers after the PL ( probability likelihood) and similarly there are several numbers after I16. I want to understand what are these numbers


ADD REPLYlink written 2.2 years ago by tarek.mohamed270

Hello again,

for the I16 there is a hint in the header:

##INFO=<ID=I16,Number=16,Type=Float,Description="Auxiliary tag used for calling, see description of bcf_callret1_t in bam2bcf.h">

If we look at this file, one can find this:

  // The fields are:
    //      depth fwd   .. ref (0) and non-ref (2)
    //      depth rev   .. ref (1) and non-ref (3)
    //      baseQ       .. ref (4) and non-ref (6)
    //      baseQ^2     .. ref (5) and non-ref (7)
    //      mapQ        .. ref (8) and non-ref (10)
    //      mapQ^2      .. ref (9) and non-ref (11)
    //      minDist     .. ref (12) and non-ref (14)
    //      minDist^2   .. ref (13) and non-ref (15)

The PL values describes a phred scaled likelihood for the different genotypes that are possible like 0/0, 0/1, 1/1, 0/2 etc. The higher the number the less likely this genotype is. But I'm unsure which position represents which genotype.

fin swimmer

ADD REPLYlink written 2.2 years ago by finswimmer14k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1053 users visited in the last hour