Question: How do I get PyVCF to give me an error if the record it is fetching doesn't exist?
0
gravatar for devenvyas
5 weeks ago by
devenvyas580
Stony Brook
devenvyas580 wrote:

I have a Python script that is supposed to be reading some emit-all VCF files and outputting a transposed Plink file for biallelic sites. The VCFs were called using custom in-house callers and thus are slightly out of spec (since they are ancient DNA), so standard programs like vcftools or bcftools are of no use. Each individual is in their own VCF.

I am having some issues particularly when one or more individuals is missing a record line altogether (i.e., some VCFs are missing a line for chr21 base 9412435). In these cases, the script just moves on the next individual in the loop for the given site. Here is an excerpt of the tped. There are 24 individuals, so there should be 48 allele calls (or 0s). Note how the first two have the appropriate number of genotypes, but the last four don't.

21 21:9412269 0 9412269 A A A A A A A A 0 0 T T 0 0 A A 0 0 0 0 0 0 0 0 A A A A T T 0 0 T T T T A A A A A A A A A A T T
21 21:9412298 0 9412298 T T T T T T T T T T T T 0 0 T T T T 0 0 T T T T T T T T T T 0 0 T T T T T T T T T T T T T T T T
21 21:9412357 0 9412357 C C 0 0 0 0 C C C C C C 0 0 0 0 C C 0 0 0 0 0 0 C C 0 0 C C C C C C C C C C C C C C A A C C
21 21:9412435 0 9412435 0 0 0 0 0 0 0 0 T T 0 0 0 0 0 0 0 0 T T 0 0 0 0 0 0 0 0
21 21:9412498 0 9412498 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G G 0 0 0 0 0 0 G G 0 0 0 0 0 0 G G 0 0 G G G G
21 21:9412503 0 9412503 0 0 0 0 C C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 C C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 C C 0 0 C C C C

I am trying to find a way to get pyvcf to do something along the lines of this:

for record in file_read[h].fetch(chromosome,pos-1,pos):
    try:
        if record "exists":
            ##figure out what genotype to add
        elif record "doesn't exist"
            outD=outD+' 0 0'

Anyone have any suggestions?

vcf • 129 views
ADD COMMENTlink written 5 weeks ago by devenvyas580

Hello,

with the code you are showing, it isn't clear to me how you parse your data. Can you please show some more?

In general: If you know how much column you expect check this for each line and handle differences appropriate.

Also using pandas dataframe might be useful in your case.

fin swimmer

ADD REPLYlink written 5 weeks ago by finswimmer12k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1798 users visited in the last hour