Hello everyone! I'm trying to filter the dbSNP VCF by using the INFO field.
These are the INFO fields I want:
INFO=<ID=CLNSIG,Number=.,Type=String,Description="Variant Clinical Significance; 0 - Uncertain significance; 1 - not provided; 2 - Benign; 3 - Likely benign; 4 - Likely pathogenic; 5 - Pathogenic; 6 - Drug response; 8 - Confers sensitivity; 9 - Risk factor; 10 - Association; 11 - Protective; 12 - Conflicting interpretations of pathogenicity; 13 - Affects; 14 - Association not found; 15 - Benign/Likely benign; 16 - Pathogenic/Likely pathogenic; 17 - Conflicting data from submitters; 18 - Pathogenic, low penetrance; 19 Likely pathogenic, low penetrance; 20 - Established risk allele; 21 - Likely risk allele; 22 - Uncertain risk allele; 255 - other">
INFO=<ID=PM,Number=0,Type=Flag,Description="Variant has associated publication">
I want to grab all SNPS which have any CLINSIG value, and those which have the PM tag
When I try:
bcftools filter -i 'INFO/CLNSIG>0' GCF_000001405.40.gz -Oz -o vcf_known_sites_newer/GCF_000001405.40_clinvar.gz
Or
bcftools filter -i 'INFO/CLNSIG>1' GCF_000001405.40.gz -Oz -o GCF_000001405.40_clinvar.gz
Or
bcftools filter -i 'INFO/CLNSIG=1' -i 'INFO/CLNSIG=2' etc up to 22 GCF_000001405.40.gz -Oz -o GCF_000001405.40_clin.gz
I'm getting empty VCFs. I also tried view instead of filter. When using bcftools filter
with -i 'INFO/PM=0' OR -i 'INFO/PM=1' OR -i 'INFO/PM'
, I get an empty vcf. I have no idea what I'm missing.
VCF from dbSNP (it's huge): dbsnp VCF