Question: PyVCF is giving 'AttributeError' when extracting values from FORMAT and SAMPLE column.
gravatar for kirannbishwa01
19 months ago by
United States
kirannbishwa01830 wrote:

I have a vcf file with following data structure

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  2ms01e  2ms02g  2ms03g  2ms04h
2   1738    .   A   G   4693.24 PASS    AC=2;AF=0.250;AN=8;set=Intersection GT:AD:DP:GQ:PB:PC:PG:PI:PL:PW      0|1:389,92:481:99:.,.,.,.,.:1.0:0|1:1020:1748,0,12243:0|1       0/0:318,0:318:99:.:.:0/0:.:0,120,1800:0/0       0|1:270,53:323:99:.,.,.,.,.:1.0:0|1:1258:990,0,9096:0|1     0/0:473,0:473:99:.:.:0/0:.:0,120,1800:0/0
2   1764    .   A   C   51892.85    PASS    AC=5;AF=0.625;AN=8;set=Intersection GT:AD:DP:GQ:PB:PC:PG:PI:PL:PW      1|0:102,415:517:99:.,.,.,.,.:1.0:1|0:1020:12332,0,2817:1|0      1/1:0,356:356:99:.:.:1/1:.:12587,1069,0:1/1     1|0:65,301:366:99:.,.,.,.,.:1.0:1|0:1258:9337,0,1279:1|0    0/1:281,353:634:99:.:.:0/1:.:10325,0,7548:0/1
2   1921    .   T   C   4465.03 PASS    AC=0;AF=0.00;AN=6;set=Intersection  GT:AD:DP:GQ:PG:PL:PW    0/0:1,0:1:3:0/0:0,3,35:0/0  ./.:0,0:0:.:./.:0,0,0:./.   0/0:1,0:1:3:0/0:0,3,39:0/0  0/0:2,0:2:6:0/0:0,6,80:0/0

Problem: The number of fields in the FORMAT column (9th column, 8th column python based) isn't the same for all the lines.

I want to read this file and mine values from specific tags like GT, PI and PG. But, all these tags are not present in all the lines; in such cases I just want to the values to be default '.'

So, the file output would have following structure:

contig  pos ref alt_My  freq_My  GT  PI  PG
2   1764    A   C   0.250   1|0   1020  1|0   
2   1921    T   C   0.00    0/0   .   0/0

I am using pyVCF module to read the file to extract these information. Below is my script:

import vcf;
vcf1_data = vcf.Reader(open('MY.phased_variants.Final_sub.vcf', 'r'))
for record in vcf1_data:
    contig1 = record.CHROM
    pos1 = record.POS
    ref_allele1 = record.REF
    alt_alleles1 = ",".join(map(str, (record.ALT[::])))
    alt_freq1 = ",".join(map(str, record.INFO['AF'])))

Now, I write these called values to an output text file as:

    output = open("My_allele_table.txt", "a")
             .format(contig1, pos1, ref_allele1, all_alleles1, all_freq1))

Additionally, I append other values to the output file. But, when doing so I get AttributeError since PI field is not present in all the line.

Incomplete solution: I added exception to the error but then it just skips reading through the end of the line.

    for sample in record.samples:
            output.write("\t{}\t{}\t{}".format(sample['GT'], sample['PI'], sample['PG']))
        except AttributeError:

Any help appreciated !

rna-seq gt format pyvcf vcf • 724 views
ADD COMMENTlink modified 19 months ago • written 19 months ago by kirannbishwa01830

Why can't simply if sample['PI'] ... else ... ?

ADD REPLYlink modified 19 months ago • written 19 months ago by geek_y8.7k

@Goutham: Can you please provide more details on your code.

ADD REPLYlink written 19 months ago by kirannbishwa01830
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 697 users visited in the last hour