How to extract MQ values from the INFO column on a VCF which has been converted to xlsx?
1
0
Entering edit mode
3.5 years ago

I am doing a project on which I have to extract several QC metrics from the INFO column of a VCF file which has been converted to excel, but I don't know how to do it other than doing it manually. The file contains almost 500000 different variants which makes it impossible to do it manually. Is there a faster way of doing this? Thanks a lot!

mq vcf INFO xlsx qc • 1.4k views
1
Entering edit mode
3.5 years ago

Well, we cannot see your Excel file, so we do not know in which shape or form the mapping qualities are encoded. Also, I (and I'm sure others) don't recommend the use of Excel if you still have some extra processing or filtering to be performed on your data. If you can get into the habit of doing filtering with useful tools/languages like AWK, SED, Python, Perl, BCFtools, etc., then you'll instantly become a more accomplished analyst.

If you can at all possible obtain the original VCF, then you can easily output MAPQ with the following:

bcftools query -f'[%CHROM:%POS:%REF:%ALT\t%MQ\t%SAMPLE\t%GT\n]' MyVariants.vcf | head

1:69511:A:G 30.14   4432    1/1
1:752721:A:G    49.84   4432    0/1
1:752894:T:C    32.45   4432    1/1
1:762273:G:A    52.12   4432    0/1
1:782981:C:T    60  4432    0/1
1:783304:T:C    47.12   4432    1/1
1:792263:A:G    20.69   4432    1/1
1:792480:C:T    56.51   4432    1/1
1:866319:G:A    60  4432    1/1
1:874314:G:A    60  4432    0/1