Filtering a VCF based on Map Quality
1
0
Entering edit mode
8.5 years ago
devenvyas ▴ 740

I am using a large VCF file (ftp://biodisk.org/Store/Genome/African/Mota_man/Bam_and_VCF/GB20_sort_merge_dedup_l30_IR_q30_mapDamage_Entire.vcf.gz), which I need to filter before converting it to Plink.

I was able to use a custom script to filter it down to the sites I have comparative data for, and I wrote a quick python script to filter out trialleic sites and Qual < 50.

#! /usr/bin/env python

OutFile = open('Mota_2.vcf', 'w')
BadFile = open('Mota_bad.vcf', 'w')
InFile = open('GB20_sort_merge_dedup_l30_IR_q30_mapDamage_Entire_filtered.vcf', 'r')

#filtering the LowQual sites and Qual < 50

import re
for Line in InFile:
    Line=Line.strip('\n')
    first = Line[0]
    if first == '#':
        OutFile.write(Line + '\n')
        #BadFile.write(Line + '\n')
    else:
        ElementListQ = Line.split('\t')
        Alt = ElementListQ[4]
        Qual = ElementListQ[5]
        if len(Alt) != 1:
            BadFile.write(Line + '\n')
        elif ElementListQ[6] == 'LowQual':
            BadFile.write(Line + '\n')
        elif float(Qual) >= 50:
            OutFile.write(Line + '\n')
        else:
            BadFile.write(Line + '\n')
InFile.close()
OutFile.close()
BadFile.close()

The authors talk about only using sites with MQ > 30.

The MQ is tied in the INFO field. Here is an example of one of those

DP=16;VDB=0.958132;SGB=-0.683931;RPB=1;MQB=1;MQSB=0.852144;BQB=1;MQ0F=0;AF1=1;AC1=2;DP4=0,1,10,3;MQ=35;FQ=-33.9875;PV4=0.285714,0.460053,0.350463,1

I was wondering what would be a good of filtering for this? Thanks!

vcf SNP • 4.1k views
ADD COMMENT
1
Entering edit mode
8.5 years ago
adam.maikai ▴ 520

Try vcflib:

vcffilter -f "MQ > 30" in.vcf > out.vcf
ADD COMMENT

Login before adding your answer.

Traffic: 2276 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6