Question: How can I add PV4 or BQ or NS to my VCF file
1
gravatar for sahar.voghoei
15 months ago by
sahar.voghoei10 wrote:

I am trying to extract maximum features for my machine learning input, and I need PV4, BQ, G3, NS to be added to my VCF files for all the SNP calling I used the sametools and bcftools as below to extract some of the information:

samtools mpileup --skip-indels -m 1 -E --BCF  --output-tags DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -f test/reference/Aradu.fa test/bam/A72.bam | bcftools call -mv --skip-variants indels --multiallelic-caller --variants-only |bcftools +fill-tags> test/test/At.vcf

The result of my code has

Aradu.A01 1345 . T G 5.57134 . DP=6;ADF=0,5;ADR=1,0;AD=1,5;VDB=0.0340507;SGB=-0.590765;RPB=1;MQB=1;MQSB=1;BQB=1;MQ0F=0;AC=2;AN=2;DP4=0,1,5,0;MQ=9;NS=1;AF=1;MAF=0;AC_Het=0;AC_Hom=2;AC_Hemi=0;HWE=1 GT:PL:DP:SP:ADF:ADR:AD 1/1:31,10,0:6:0:0,5:1,0:1,5

How can I include any of these PV4, BQ, G3, NS to the file

Thanks

snp bcftools samtools vcf • 689 views
ADD COMMENTlink modified 15 months ago by Kevin Blighe39k • written 15 months ago by sahar.voghoei10
2
gravatar for Kevin Blighe
15 months ago by
Kevin Blighe39k
Republic of Ireland
Kevin Blighe39k wrote:

Hi Sahar,

I get PV4 (P-values for strand bias, baseQ bias, mapQ bias and tail distance bias) and G3 (ML estimate of genotype frequencies) automatically when I align my data with bwa mem and then call variants with samtools mpileup piped into bcftools call (latest versions).

NS (Number of Samples With Data) may be a tag that was used a lot in the past but that has been more or less replaced. You can obtains similar information by looking at the AC (allele count in genotypes, for each ALT allele, in the same order as listed), AF (allele frequency for each ALT allele in the same order as listed: use this when estimated from primary data, not called genotypes), and AN (total number of alleles in called genotypes) tags.

The same may be true for BQ (RMS base quality at this position). However, using samtools mpileup, you can ensure a minimal base quality on variant bases with the --min-BQ comman-lime parameter.

ADD COMMENTlink written 15 months ago by Kevin Blighe39k

Thank Kevin. Is there any way to get PV4 and G3 from samtools or bcftools? My bam files are huge and using bwa add 2 more step to my process 1- change bam files to something suitable for baw mem 2- the use bam mem

I prefer to have going through each file as less as possible (performance problem) Do you have any idea how?

ADD REPLYlink written 15 months ago by sahar.voghoei10
1

The different alignment tools each record different metrics in the BAM file, which is then used by the downstream tools. BWA and SAMtools/BCFtools come from the same group of developers, whereas Bowtie, TopHat, and other aligners are from different groups.

Just to be sure: which versions of these programs are you using and which aligner did you use?

ADD REPLYlink modified 12 months ago • written 15 months ago by Kevin Blighe39k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2250 users visited in the last hour