How can I add PV4 or BQ or NS to my VCF file
1
1
Entering edit mode
6.4 years ago

I am trying to extract maximum features for my machine learning input, and I need PV4, BQ, G3, NS to be added to my VCF files for all the SNP calling I used the sametools and bcftools as below to extract some of the information:

samtools mpileup --skip-indels -m 1 -E --BCF  --output-tags DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -f test/reference/Aradu.fa test/bam/A72.bam | bcftools call -mv --skip-variants indels --multiallelic-caller --variants-only |bcftools +fill-tags> test/test/At.vcf

The result of my code has

Aradu.A01 1345 . T G 5.57134 . DP=6;ADF=0,5;ADR=1,0;AD=1,5;VDB=0.0340507;SGB=-0.590765;RPB=1;MQB=1;MQSB=1;BQB=1;MQ0F=0;AC=2;AN=2;DP4=0,1,5,0;MQ=9;NS=1;AF=1;MAF=0;AC_Het=0;AC_Hom=2;AC_Hemi=0;HWE=1 GT:PL:DP:SP:ADF:ADR:AD 1/1:31,10,0:6:0:0,5:1,0:1,5

How can I include any of these PV4, BQ, G3, NS to the file

Thanks

samtools VCF SNP vcf bcftools • 2.6k views
ADD COMMENT
2
Entering edit mode
6.4 years ago

Hi Sahar,

I get PV4 (P-values for strand bias, baseQ bias, mapQ bias and tail distance bias) and G3 (ML estimate of genotype frequencies) automatically when I align my data with bwa mem and then call variants with samtools mpileup piped into bcftools call (latest versions).

NS (Number of Samples With Data) may be a tag that was used a lot in the past but that has been more or less replaced. You can obtains similar information by looking at the AC (allele count in genotypes, for each ALT allele, in the same order as listed), AF (allele frequency for each ALT allele in the same order as listed: use this when estimated from primary data, not called genotypes), and AN (total number of alleles in called genotypes) tags.

The same may be true for BQ (RMS base quality at this position). However, using samtools mpileup, you can ensure a minimal base quality on variant bases with the --min-BQ comman-lime parameter.

ADD COMMENT
0
Entering edit mode

Thank Kevin. Is there any way to get PV4 and G3 from samtools or bcftools? My bam files are huge and using bwa add 2 more step to my process 1- change bam files to something suitable for baw mem 2- the use bam mem

I prefer to have going through each file as less as possible (performance problem) Do you have any idea how?

ADD REPLY
1
Entering edit mode

The different alignment tools each record different metrics in the BAM file, which is then used by the downstream tools. BWA and SAMtools/BCFtools come from the same group of developers, whereas Bowtie, TopHat, and other aligners are from different groups.

Just to be sure: which versions of these programs are you using and which aligner did you use?

ADD REPLY

Login before adding your answer.

Traffic: 2096 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6