Question: How can I add PV4 or BQ or NS to my VCF file
gravatar for sahar.voghoei
2.8 years ago by
sahar.voghoei10 wrote:

I am trying to extract maximum features for my machine learning input, and I need PV4, BQ, G3, NS to be added to my VCF files for all the SNP calling I used the sametools and bcftools as below to extract some of the information:

samtools mpileup --skip-indels -m 1 -E --BCF  --output-tags DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -f test/reference/Aradu.fa test/bam/A72.bam | bcftools call -mv --skip-variants indels --multiallelic-caller --variants-only |bcftools +fill-tags> test/test/At.vcf

The result of my code has

Aradu.A01 1345 . T G 5.57134 . DP=6;ADF=0,5;ADR=1,0;AD=1,5;VDB=0.0340507;SGB=-0.590765;RPB=1;MQB=1;MQSB=1;BQB=1;MQ0F=0;AC=2;AN=2;DP4=0,1,5,0;MQ=9;NS=1;AF=1;MAF=0;AC_Het=0;AC_Hom=2;AC_Hemi=0;HWE=1 GT:PL:DP:SP:ADF:ADR:AD 1/1:31,10,0:6:0:0,5:1,0:1,5

How can I include any of these PV4, BQ, G3, NS to the file


snp bcftools samtools vcf • 1.3k views
ADD COMMENTlink modified 2.8 years ago by Kevin Blighe65k • written 2.8 years ago by sahar.voghoei10
gravatar for Kevin Blighe
2.8 years ago by
Kevin Blighe65k
Kevin Blighe65k wrote:

Hi Sahar,

I get PV4 (P-values for strand bias, baseQ bias, mapQ bias and tail distance bias) and G3 (ML estimate of genotype frequencies) automatically when I align my data with bwa mem and then call variants with samtools mpileup piped into bcftools call (latest versions).

NS (Number of Samples With Data) may be a tag that was used a lot in the past but that has been more or less replaced. You can obtains similar information by looking at the AC (allele count in genotypes, for each ALT allele, in the same order as listed), AF (allele frequency for each ALT allele in the same order as listed: use this when estimated from primary data, not called genotypes), and AN (total number of alleles in called genotypes) tags.

The same may be true for BQ (RMS base quality at this position). However, using samtools mpileup, you can ensure a minimal base quality on variant bases with the --min-BQ comman-lime parameter.

ADD COMMENTlink written 2.8 years ago by Kevin Blighe65k

Thank Kevin. Is there any way to get PV4 and G3 from samtools or bcftools? My bam files are huge and using bwa add 2 more step to my process 1- change bam files to something suitable for baw mem 2- the use bam mem

I prefer to have going through each file as less as possible (performance problem) Do you have any idea how?

ADD REPLYlink written 2.8 years ago by sahar.voghoei10

The different alignment tools each record different metrics in the BAM file, which is then used by the downstream tools. BWA and SAMtools/BCFtools come from the same group of developers, whereas Bowtie, TopHat, and other aligners are from different groups.

Just to be sure: which versions of these programs are you using and which aligner did you use?

ADD REPLYlink modified 2.5 years ago • written 2.8 years ago by Kevin Blighe65k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 713 users visited in the last hour