For an interview I was required to write a python script with the following specifications:
"-send me a python script that conforms to the following specifications:
It takes a single command line argument, which is the full path to a fasta file
It writes one line on the standard out for each fasta record
2a. The line consists of two values, separated by a space: the sequence id, and the percent GC
Below is the script that I came up with. The interviewer got back to me a few weeks later and said that my program had a bug and did not follow specifications. He was not able to specify what my errors were which left me confused because my program was running fine when I sent it to him.
Can anyone point out a bug or how I failed to follow specifications?
Thank you for your help!
def gc_content (filename): gc = at = unknown = 0 with open(filename, 'r') as f: for line in f: if line[:1] == '>': if (gc + at) > 0: # used to skip first ID line total = gc + at percentage = int(round(gc / total * 100)) print "%s %d" %(seq_id, percentage) seq_id = line.strip() # saves the ID line for printing later gc = at = unknown = 0 else: nuc_str = list(line.strip()) for n in nuc_str: if n == 'G' or n == 'g' or n == 'C' or n == 'c': gc += 1.0 elif n == 'A' or n == 'a' or n == 'T' or n == 't': at += 1.0 else: unknown += 1.0 # usually represented by an 'N' in the fasta file. print "%s %d" % (seq_id, percentage)
For some reason indenting is off at the top, but it is correct in my original script