Question: Printing Out Hash (Dictionary) Using Hash Class Python
0
gravatar for st.ph.n
6.3 years ago by
st.ph.n2.5k
Philadelphia, PA
st.ph.n2.5k wrote:

I have an assignment to create a hash class in python, to implement on a text file containing restriction enzymes, with sequences in the first column and the RE name in the second. The goal is to create a random 6bp sequence, and see if it exists in the text file by printing out the corresponding enzyme name. I can do this with just lists, and turning it into a dictionary, and using get() to retrieve the corresponding enzyme name, however I must create a class.

Here is the hash class, and random sequence generator (there may be indentation errors when pasting):

> class KeyValue:
>     def __init__(self,key,value):
>         self.key=key
>         self.value=value
>     def __str__(self):
>         return str(self.key)+":"+str(self.value) class HashTable:
> 
>     def __init__(self, SIZE):
>         i=0
>         self.list=[]
>         for i in range(SIZE):
>             self.list.append([])
>             i=i+1
>         self.SIZE = len(self.list)
> 
>     def getValue(self,key):
>         h = self.hash(key)
>         bucket = self.list[h]
>         for kv in bucket:
>             if kv.key==key:
>                 return kv.value
>     def setValue(self,key,value):
>         h = self.hash(key)
>         # should search first so we don't put key in twice, but for now ignore
>         self.list[h].append(KeyValue(key,value))
> 
>     def hash(self, key):
>         i=0
>         total=0
>         while i<len(key):
>             total = total+ord(key[i])
>             i=i+1
>         return total % self.SIZE
> 
> def random_DNA(length):
>     return ''.join(random.choice('ATCG') for _ in xrange(length))

Here is the code, used to import the module:

from HashTable import *

fh = open("restriction_enzymes.txt", "r")

num_lines = int(sum(1 for line in fh))

print num_lines

hashtable = HashTable(int(num_lines))

for line in fh:
    (key, value) = line.strip().split('t')
    hashtable.setValue(key, value)

print hashtable

DNA = random_DNA(6)
print DNA
print hastable.getValue(DNA)
print hashtable.getValue('AACGTT')

First, I want to be able to print out the hash table. I want to be able to visualize the dictionary. When attempting to do that with the line " print hashtable" after it is created, I get the output: "HashTable.HashTable instance at 0x7ff53555db90", which I guess is the location in memory? Do I have to implement a str or repr function in the HashTable class? Can someone assist with this?

Regarding the main output of the program, if I put the variable DNA, the random seq, in for "print hashtable.getValue(DNA)", I get the output of "None". Ok, so that random seq isn't in there. So, I tried copying and pasting the first seq from the text file 'AACGTT' which corresponds to the RE AclI. However, I still get the output "None". Anyone have any ideas what I'm doing wrong here?

All the seq identifiers in the text file are either 4, 6, or 8 bp long. Since the random sequence is 6, does using the hash function know to pull it if it's 4 or 8 and the 6 bp seq matches somewhere in the 4 or 8? Or should I implement a function to try and pull it based on nucleotide matches?

All help is appreciated. Best.

python homework • 4.2k views
ADD COMMENTlink modified 6.3 years ago by Matt Shirley9.4k • written 6.3 years ago by st.ph.n2.5k

is reinventing the python dictionary part of the homework assignment? Seems like you could use a dictionary comprehension, something like {k:v for k,v in [L.strip().split("\t") for L in fh]}

You would still need to build a class around the python dict, including a __str__ method, and another for exracting the "partial" matches you want to report.

ADD REPLYlink written 6.3 years ago by David W4.8k
1
gravatar for Matt Shirley
6.3 years ago by
Matt Shirley9.4k
Cambridge, MA
Matt Shirley9.4k wrote:

I don't think you need to re-invent Python's dictionary class. Why not:

from json import dumps
fh = open("restriction_enzymes.txt", "r")

enzyme_sites = dict()

for line in fh:
  seq, name = line.rstrip().split()
  enzyme_sites[seq] = name

# here is a nice way to print our dictionary
print(dumps(enzyme_sites, indent=4))

DNA = random_DNA(6)
print(DNA)
print(enzyme_sites[DNA])
print(enzyme_sites['AACGTT'])
fh.close()

the json module has a dumps function that will format your dictionary nicely or printing.

ADD COMMENTlink modified 6.3 years ago • written 6.3 years ago by Matt Shirley9.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 815 users visited in the last hour