Printing Out Hash (Dictionary) Using Hash Class Python
1
0
Entering edit mode
10.0 years ago
st.ph.n ★ 2.7k

I have an assignment to create a hash class in python, to implement on a text file containing restriction enzymes, with sequences in the first column and the RE name in the second. The goal is to create a random 6bp sequence, and see if it exists in the text file by printing out the corresponding enzyme name. I can do this with just lists, and turning it into a dictionary, and using get() to retrieve the corresponding enzyme name, however I must create a class.

Here is the hash class, and random sequence generator (there may be indentation errors when pasting):

> class KeyValue:
>     def __init__(self,key,value):
>         self.key=key
>         self.value=value
>     def __str__(self):
>         return str(self.key)+":"+str(self.value) class HashTable:
> 
>     def __init__(self, SIZE):
>         i=0
>         self.list=[]
>         for i in range(SIZE):
>             self.list.append([])
>             i=i+1
>         self.SIZE = len(self.list)
> 
>     def getValue(self,key):
>         h = self.hash(key)
>         bucket = self.list[h]
>         for kv in bucket:
>             if kv.key==key:
>                 return kv.value
>     def setValue(self,key,value):
>         h = self.hash(key)
>         # should search first so we don't put key in twice, but for now ignore
>         self.list[h].append(KeyValue(key,value))
> 
>     def hash(self, key):
>         i=0
>         total=0
>         while i<len(key):
>             total = total+ord(key[i])
>             i=i+1
>         return total % self.SIZE
> 
> def random_DNA(length):
>     return ''.join(random.choice('ATCG') for _ in xrange(length))

Here is the code, used to import the module:

from HashTable import *

fh = open("restriction_enzymes.txt", "r")

num_lines = int(sum(1 for line in fh))

print num_lines

hashtable = HashTable(int(num_lines))

for line in fh:
    (key, value) = line.strip().split('t')
    hashtable.setValue(key, value)

print hashtable

DNA = random_DNA(6)
print DNA
print hastable.getValue(DNA)
print hashtable.getValue('AACGTT')

First, I want to be able to print out the hash table. I want to be able to visualize the dictionary. When attempting to do that with the line " print hashtable" after it is created, I get the output: "HashTable.HashTable instance at 0x7ff53555db90", which I guess is the location in memory? Do I have to implement a str or repr function in the HashTable class? Can someone assist with this?

Regarding the main output of the program, if I put the variable DNA, the random seq, in for "print hashtable.getValue(DNA)", I get the output of "None". Ok, so that random seq isn't in there. So, I tried copying and pasting the first seq from the text file 'AACGTT' which corresponds to the RE AclI. However, I still get the output "None". Anyone have any ideas what I'm doing wrong here?

All the seq identifiers in the text file are either 4, 6, or 8 bp long. Since the random sequence is 6, does using the hash function know to pull it if it's 4 or 8 and the 6 bp seq matches somewhere in the 4 or 8? Or should I implement a function to try and pull it based on nucleotide matches?

All help is appreciated. Best.

python homework • 5.2k views
ADD COMMENT
0
Entering edit mode

is reinventing the python dictionary part of the homework assignment? Seems like you could use a dictionary comprehension, something like {k:v for k,v in [L.strip().split("\t") for L in fh]}

You would still need to build a class around the python dict, including a __str__ method, and another for exracting the "partial" matches you want to report.

ADD REPLY
1
Entering edit mode
10.0 years ago

I don't think you need to re-invent Python's dictionary class. Why not:

from json import dumps
fh = open("restriction_enzymes.txt", "r")

enzyme_sites = dict()

for line in fh:
  seq, name = line.rstrip().split()
  enzyme_sites[seq] = name

# here is a nice way to print our dictionary
print(dumps(enzyme_sites, indent=4))

DNA = random_DNA(6)
print(DNA)
print(enzyme_sites[DNA])
print(enzyme_sites['AACGTT'])
fh.close()

the json module has a dumps function that will format your dictionary nicely or printing.

ADD COMMENT

Login before adding your answer.

Traffic: 2934 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6