vcfnulldotslashdot Type-Error: can't multiply sequence by non-int of type 'float'
0
0
Entering edit mode
3.2 years ago

Hi everyone,

I'm working on a CentOS based computer cluster and wanted to run vcfnulldotslashdot on some qualityfiltered and normalized vcf-files but got an Error message regarding line 18 of the vcfnulldotslashdot-script:

Line 18: flatgls = ",".join([str(x) for x in [0]*multcoeff(alleles,2)])
TypeError: can't multiply sequence by non-int of type 'float'

This is the nulldotslashdotscript:

#!/usr/bin/env python3
2. # Rewrite null genotypes to ./.
3. #
4. import sys
5. import math
6. 
7. def bincoeff(n,k): return math.factorial(n) / (math.factorial(n-k)*math.factorial(k))
8. def multcoeff(n,k): return bincoeff(n+k-1,k)
9. 
10. for line in sys.stdin:
11.     if line.startswith("#"):
12.         print(line.strip())
13.         continue
14.     fields = line.strip().split("\t")
15.     alleles = len(fields[4].split(","))+1
16.     # assume that we have GT:GL
17.     # how many genotypes?  assume diploid
18.     flatgls = ",".join([str(x) for x in [0]*multcoeff(alleles,2)])
19.     for i in range(9, len(fields)):
20.         if fields[i] == ".":
21.             fields[i] = "./.:" + flatgls
22.     print("\t".join(fields))

I figured the problem was likely that for some reason the multcoeff got formatted as a float number which then caused some trouble when applying to the [0]*multcoeff command. As the binomialcoefficient should be a whole number in theory I tried to resolve this by changing line 8 of the code a little and just converting multcoeff to an integer.

Like this:

1. #!/usr/bin/env python3
2. # Rewrite null genotypes to ./.
3. #
4. import sys
5. import math
6. 
7. def bincoeff(n,k): return math.factorial(n) / (math.factorial(n-k)*math.factorial(k))
8. def multcoeff(n,k): return int(bincoeff(n+k-1,k))
9. 
10. for line in sys.stdin:
11.     if line.startswith("#"):
12.         print(line.strip())
13.         continue
14.     fields = line.strip().split("\t")
15.     alleles = len(fields[4].split(","))+1
16.     # assume that we have GT:GL
17.     # how many genotypes?  assume diploid
18.     flatgls = ",".join([str(x) for x in [0]*multcoeff(alleles,2)])
19.     for i in range(9, len(fields)):
20.         if fields[i] == ".":
21.             fields[i] = "./.:" + flatgls
22.     print("\t".join(fields))

It seemed to work, at least I didn't get the error anymore. But I'm quite new to programming and bioinformatics and am a bit insecure now. I'm wondering if I can really trust the resulting vcf-file and the change of the code didn't change anything about the function. Also I'm puzzled why nobody else seems to have had this issue before, at least I couldn't find anything on google about it.

Would be very grateful if someone with more expertise could comment on this.

software error SNP • 579 views
ADD COMMENT

Login before adding your answer.

Traffic: 2526 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6