I have VCF file and part of it looks like this:
I need to extract
Gene.refGene=NONE,DDX11L1 which is between semicolons, I also need to extract
AAChange.refGene=. which are all also between semicolons.
I tried to do it like this:
import sys import re def parse_vcf(vcf_file): pattern=re.compile(r'"([^;]*)"' , 'Gene.refGene') f=open(vcf_file , 'r') for line in f: if pattern.search(line): continue return if __name__ == '__main__': vcf=sys.argv parse_vcf(vcf)
but it is not working. thank you for your suggestions.``