Entering edit mode
4.6 years ago
tarek.mohamed ▴ 350
I am trying to use modify bcftools mpileup / call to detect snps from nanopore sequence data with huger depth ( >5000X).
I need to understand the format of the mpileup shown bellow. I used position 84303804 as an example
> bcftools mpileup --skip-indels -C 0 -d 250 -r chr9:84273123-84368634 --threads 24 --output-type v -f NCBI.GRCh38.fa --output BC01.bcftooslmpileup_noindels.vcf BC01.bam > cat BC01.bcftooslmpileup_noindels.vcf | grep 84303804 chr9 84303804 . C T,A,<*> 0 . DP=476;I16=73,12,0,32,1524,28372,525,8835,5090,304900,1920,115200,2125,53125,800,20000;QS=0.743777,0.243533,0.0126891,0;VDB=0.00960702;SGB=-0.69312;RPB=0.345296;MQB=0.995226;MQSB=0.992367;BQB=0.134655;MQ0F=0 PL 0,31,127,226,196,166,255,217,172,166
bcftools mpileupcalculates the genotype likelihoods for each position. It is not doing variant calling! What you see is the position, the REF and observed ALT alleles, some
mpileupspecific information which are used later in variant calling (for there description look into the header of your output) and the likelihoods for each possible genotype.
To make variant calling, you have to pass the output of
Hi Thanks for the reply I know about bcftools call, but I want to understand all of mpileup output because as you mentioned this will be used by bcftools call. By modifying bcftools mpileup parameters, hopefully I can call variants form the long reads sequence I have. My bam file pile (with huge depth of coverage) clearly shows the variant positions. If I ran bcftools mpileup | call using default parameters, it Ignore calling some clear variants.
For example, there are 10 numbers after the PL ( probability likelihood) and similarly there are several numbers after I16. I want to understand what are these numbers
for the I16 there is a hint in the header:
If we look at this file, one can find this:
PLvalues describes a phred scaled likelihood for the different genotypes that are possible like
0/2etc. The higher the number the less likely this genotype is. But I'm unsure which position represents which genotype.