Question

PyVCF error: KeyError: 'ANN' when appending annotation fields to a list

0

Entering edit mode

6.9 years ago

spiral01 ▴ 110

I have downloaded the 1000 genomes phase 3 vcf file for chromosome 1 and annotated it using snpEff. I am now trying to parse the annotated file to create a new text file with only the data I need. My issue is when I try to parse the annotation field. My code for this bit is as below:

   tempList = []
   vcf_reader = vcf.Reader(open('/ann.chr1.vcf', 'r'))
   for record in vcf_reader:
        annList = [i.split('|') for i in record.INFO['ANN']]

This runs but I get the error:

Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
KeyError: 'ANN'

I have tried running this same code on a much smaller file (I literally just took a few lines from the vcf file as well as the metadata to create a small file to test on) and it runs fine, but when I run it on the full vcf file I am getting this error. I have even tried appending the whole 'ANN' field to a list using:

for record in vcf_reader:
    annList.appendrecord.INFO['ANN'])

This works fine for all other fields (e.g. record.INFO['CHROM']) but I get the same error when it comes to the 'ANN' field. The code does run for a bit and I have checked the length of the list, but it is different everytime I run this code, indicating it is stopping at different points each time. As such, I really am not sure what is going on here. Thanks.

python PyVCF snp • 1.7k views

ADD COMMENT • link 6.9 years ago by spiral01 ▴ 110

0

Entering edit mode

Try debugging with a try-except statement to figure out on which lines it goes wrong, then look at those lines:

for record in vcf_reader:
    try:    
        annList.appendrecord.INFO['ANN'])
    except KeyError:
        print(record)

ADD REPLY • link 6.9 years ago by WouterDeCoster 47k