How to call genotype bases for all the samples in a vcf using for-loop?
2
1
Entering edit mode
4.6 years ago
kirannbishwa01 ★ 1.3k

I can call genotype bases for one sample at a time separately using pyVCF module as:

        for sample in record.samples:
                call = record.genotype('MA611')
                print(call.gt_bases)

Ouput:

    A/A
    A|G
    G/G
    C|T

But, how can I print the gt_bases for all the samples in a vcf file using for loop. I tried:

        for sample in record.samples:
            for x in record.genotype:
                call = record.genotype(x)
                print(call.gt_bases)

I am getting error:

    for x in record.genotype:
TypeError: 'method' object is not iterable

Thanks !

pyvcf vcf snps GT software error • 2.3k views
ADD COMMENT
0
Entering edit mode

Isn't this an odd loop? You loop over all samples, but never actually use that variable sample in your loop.

for sample in record.samples:
    call = record.genotype('MA611')
    print(call.gt_bases)
ADD REPLY
0
Entering edit mode
4.6 years ago
kirannbishwa01 ★ 1.3k

yes, it is.

call is a method to find several records from a sample. But, sample name needs to be specified. I wanted to do this naturally without having to specify the sample name. But, pyVCF tutorial doesn't really have such example, not it is described elsewhere. So, i resorted to printing the line which is a dictionary then I converted it to string - then split it - then selected the sample name by index -- which finally worked but I think its not a best way to do it. but its ok.

ADD COMMENT
0
Entering edit mode
4.6 years ago
DG 7.2k

Pretty sure you have everything you need there, you're just not using it:

for record in records:
    for sample in record.samples:
        call = record.genotype(sample)
        print(call.gt_bases)
ADD COMMENT
0
Entering edit mode

@Dan Gaston:

I had a chance to revisit this problem. But, it's still not working.

I tried:

changed other way since there is no method called record

for records in record:  
    for sample in record.samples:
        print(sample) # works
        Call(sample=MA605, CallData(GT=0/0, AD=[4, 0], DP=4, GQ=12, PG=0/0, PL=[0, 12, 163], PW=0/0))

But as soon as I do:

        call = record.genotype(sample)
        print(call.gt_bases)

I get error message:

Traceback (most recent call last):
    File "/home/everestial007/PycharmProjects/stitcher/pHASE-Stitcher-Markov/markov_final_test/for_loop_test_inVcf.py", line 74, in <module>
    call = record.genotype(sample)
    File "/home/everestial007/anaconda3/lib/python3.5/site-packages/vcf/model.py", line 277, in genotype
        return self.samples[self._sample_indexes[name]]
TypeError: unhashable type: '_Call'

The problem is that the output of print(sample) is not a one-word string but a tuple/list.

Any suggestions.

ADD REPLY
1
Entering edit mode

the data structure has the attribute sample right there. So instead of call = record.genotype(sample) it looks like it is probably call = record.genotype(sample.sample) or something along those lines.

ADD REPLY
1
Entering edit mode

@Dan Gaston: Can you please look into this problem? How to write the variant (allele) information back into a vcf file?

ADD REPLY

Login before adding your answer.

Traffic: 1415 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6