I am trying to extract maximum features for my machine learning input, and I need PV4, BQ, G3, NS to be added to my VCF files for all the SNP calling I used the sametools and bcftools as below to extract some of the information:
samtools mpileup --skip-indels -m 1 -E --BCF --output-tags DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -f test/reference/Aradu.fa test/bam/A72.bam | bcftools call -mv --skip-variants indels --multiallelic-caller --variants-only |bcftools +fill-tags> test/test/At.vcf
The result of my code has
Aradu.A01 1345 . T G 5.57134 . DP=6;ADF=0,5;ADR=1,0;AD=1,5;VDB=0.0340507;SGB=-0.590765;RPB=1;MQB=1;MQSB=1;BQB=1;MQ0F=0;AC=2;AN=2;DP4=0,1,5,0;MQ=9;NS=1;AF=1;MAF=0;AC_Het=0;AC_Hom=2;AC_Hemi=0;HWE=1 GT:PL:DP:SP:ADF:ADR:AD 1/1:31,10,0:6:0:0,5:1,0:1,5
How can I include any of these PV4, BQ, G3, NS to the file
Thanks
Thank Kevin. Is there any way to get PV4 and G3 from samtools or bcftools? My bam files are huge and using bwa add 2 more step to my process 1- change bam files to something suitable for baw mem 2- the use bam mem
I prefer to have going through each file as less as possible (performance problem) Do you have any idea how?
The different alignment tools each record different metrics in the BAM file, which is then used by the downstream tools. BWA and SAMtools/BCFtools come from the same group of developers, whereas Bowtie, TopHat, and other aligners are from different groups.
Just to be sure: which versions of these programs are you using and which aligner did you use?