I have an assignment to create a hash class in python, to implement on a text file containing restriction enzymes, with sequences in the first column and the RE name in the second. The goal is to create a random 6bp sequence, and see if it exists in the text file by printing out the corresponding enzyme name. I can do this with just lists, and turning it into a dictionary, and using get() to retrieve the corresponding enzyme name, however I must create a class.
Here is the hash class, and random sequence generator (there may be indentation errors when pasting):
> class KeyValue: > def __init__(self,key,value): > self.key=key > self.value=value > def __str__(self): > return str(self.key)+":"+str(self.value) class HashTable: > > def __init__(self, SIZE): > i=0 > self.list= > for i in range(SIZE): > self.list.append() > i=i+1 > self.SIZE = len(self.list) > > def getValue(self,key): > h = self.hash(key) > bucket = self.list[h] > for kv in bucket: > if kv.key==key: > return kv.value > def setValue(self,key,value): > h = self.hash(key) > # should search first so we don't put key in twice, but for now ignore > self.list[h].append(KeyValue(key,value)) > > def hash(self, key): > i=0 > total=0 > while i<len(key): > total = total+ord(key[i]) > i=i+1 > return total % self.SIZE > > def random_DNA(length): > return ''.join(random.choice('ATCG') for _ in xrange(length))
Here is the code, used to import the module:
from HashTable import * fh = open("restriction_enzymes.txt", "r") num_lines = int(sum(1 for line in fh)) print num_lines hashtable = HashTable(int(num_lines)) for line in fh: (key, value) = line.strip().split('t') hashtable.setValue(key, value) print hashtable DNA = random_DNA(6) print DNA print hastable.getValue(DNA) print hashtable.getValue('AACGTT')
First, I want to be able to print out the hash table. I want to be able to visualize the dictionary. When attempting to do that with the line " print hashtable" after it is created, I get the output: "HashTable.HashTable instance at 0x7ff53555db90", which I guess is the location in memory? Do I have to implement a str or repr function in the HashTable class? Can someone assist with this?
Regarding the main output of the program, if I put the variable DNA, the random seq, in for "print hashtable.getValue(DNA)", I get the output of "None". Ok, so that random seq isn't in there. So, I tried copying and pasting the first seq from the text file 'AACGTT' which corresponds to the RE AclI. However, I still get the output "None". Anyone have any ideas what I'm doing wrong here?
All the seq identifiers in the text file are either 4, 6, or 8 bp long. Since the random sequence is 6, does using the hash function know to pull it if it's 4 or 8 and the 6 bp seq matches somewhere in the 4 or 8? Or should I implement a function to try and pull it based on nucleotide matches?
All help is appreciated. Best.