How to extract MQ values from the INFO column on a VCF which has been converted to xlsx?
Entering edit mode
3.5 years ago

I am doing a project on which I have to extract several QC metrics from the INFO column of a VCF file which has been converted to excel, but I don't know how to do it other than doing it manually. The file contains almost 500000 different variants which makes it impossible to do it manually. Is there a faster way of doing this? Thanks a lot!

mq vcf INFO xlsx qc • 1.4k views
Entering edit mode
3.5 years ago

Well, we cannot see your Excel file, so we do not know in which shape or form the mapping qualities are encoded. Also, I (and I'm sure others) don't recommend the use of Excel if you still have some extra processing or filtering to be performed on your data. If you can get into the habit of doing filtering with useful tools/languages like AWK, SED, Python, Perl, BCFtools, etc., then you'll instantly become a more accomplished analyst.

If you can at all possible obtain the original VCF, then you can easily output MAPQ with the following:

bcftools query -f'[%CHROM:%POS:%REF:%ALT\t%MQ\t%SAMPLE\t%GT\n]' MyVariants.vcf | head

1:69511:A:G 30.14   4432    1/1
1:752721:A:G    49.84   4432    0/1
1:752894:T:C    32.45   4432    1/1
1:762273:G:A    52.12   4432    0/1
1:782981:C:T    60  4432    0/1
1:783304:T:C    47.12   4432    1/1
1:792263:A:G    20.69   4432    1/1
1:792480:C:T    56.51   4432    1/1
1:866319:G:A    60  4432    1/1
1:874314:G:A    60  4432    0/1

Login before adding your answer.

Traffic: 2166 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6