Remove ambiguous calls in the VCF file
1
0
Entering edit mode
5.6 years ago
SOHAIL ▴ 400

Hi everyone,

I have a VCF file with multiple ambiguous ref/alt calls at some positions of the genome with ref allele type Y, R, M, K, S, W (i.e. two-base ambiguity codes). e.g.

3       60830534        .       M       C       101     .       .       GT:DP:A:C:G:T:PP:GQ     1/1:24:0,0:14,9:0,0:1,0:1038,0,808,782,114,898,883,114,101,806:101,

is there any way to remove them all from the VCF file?

Kind Regards sohail

VCF • 3.1k views
ADD COMMENT
0
Entering edit mode
5.6 years ago
ATpoint 82k

This will print the header of the VCF and only those entries where both REF and ALT are A/C/T/G:

awk '$1 ~ /^#/ {print $0;next} {if ($4 ~ /A|C|T|G/ && $5 ~ /A|C|T|G/) print $0}' in.vcf > filtered.vcf
ADD COMMENT
0
Entering edit mode

@ATpoint Thanks for the reply!

Your command is working good for the type of VCF file where only variants are only called (i.e. at both columns of REF/ALT A/T/G/C should be present).

However, my VCF file is called with all genotypes of the genome "all-positions" (either homo ref or homo alt or het sites) together with ambiguous variant call set. and column 5 (ALT) of VCF might be filled with the period (i.e. dot symbol) e.g.

3       60830534        .       M       C       101     .       .       GT:DP:A:C:G:T:PP:GQ     1/1:24:0,0:14,9:0,0:1,0:1038,0,808,782,114,898,883,114,101,806:101,
3       60830535        .       C       .       101     .       .       GT:DP:A:C:G:T:PP:GQ     1/1:24:0,0:14,9:0,0:1,0:1038,0,808,782,114,898,883,114,101,806:101,

When i modified the command with following, ambiguous call is still there

awk '$1 ~ /^#/ {print $0;next} {if ($4 ~ /A|C|T|G/ && $5 ~ /.|A|C|T|G/) print $0}' in.vcf > filtered.vcf

am I doing any mistake?

ADD REPLY
0
Entering edit mode

Sorry, I do not get it. From the two lines above, the one where REF is M and the one with ALT ., which of these should be removed?

ADD REPLY
0
Entering edit mode

@ATpoint, The lines with M (and others Y, R, W, K, S, (i.e. two-base ambiguity codes) ) in VCF file will be removed.

edit: any help???

ADD REPLY

Login before adding your answer.

Traffic: 1833 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6