Question: [Python] How to convert a list to a dictionary if dict() does not work without using BioPython
0
gravatar for silvie.school
2.9 years ago by
silvie.school0 wrote:

Hello everybody,

Sample file:

>sp|Q6GZX2|003R_FRG3G  (438 aa)
Uncharacterized protein 3R.  [Frog virus 3 (isolate Goorha) (FV-3)]
MARPLLGKTSSVRRRLESLSACSIFFFLRKFCQKMASLVFLNSPVYQMSNILLTERRQVDRAMGGSDDDGVMVVALSPSD
FKTVLGSALLAVERDMVHVVPKYLQTPGILHDMLVLLTPIFGEALSVDMSGATDVMVQQIATAGFVDVDPLHSSVSWKDN
VSCPVALLAVSNAVRTMMGQPCQVTLIIDVGTQNILRDLVNLPVEMSGDLQVMAYTKDPLGKVPAVGVSVFDSGSVQKGD
AHSVGAPDGLVSFHTHPVSSAVELNYHAGWPSNVDMSSLLTMKNLMHVVVAEEGLWTMARTLSMQRLTKVLTDAEKDVMR
AAAFNLFLPLNELRVMGTKDSNNKSLKTYFEVFETFTIGALMKHSGVTPTAFVDRRWLDNTIYHMGFIPWGRDMRFVVEY
DLDGTNPFLNTVPTLMSVKRKAKIQEMFDNMVSRMVTS
      2 - 9:          ArpllGKT

Sample code:

 def get_sequence(): 
     try:
         with open("Filename.txt") as f:
             file = f.readlines()
             raw_data = ''
             start_reading = False
             for line in file:
                 if line.startswith(">"):
                     start_reading = True
                 if start_reading:
                     raw_data += line
             sequence = raw_data.split(">") 
             sequence = sequence[1:]       
     except IOError:
         print('Some meaningfull message')
         quit()
     finally:
         print(sequence[0])
         print(sequence[1])
         dict(sequence)
         return sequence

My question is how can I convert the list sequence to a dictionary? It would be really nice if the organism is the key and the value is a list of the other data. The dict() method raises a ValueError.

This is a school assignment, so I'm not allowed to use BioPython.

Thanks in advance!

software error • 6.7k views
ADD COMMENTlink modified 2.9 years ago by WouterDeCoster37k • written 2.9 years ago by silvie.school0

How can I post code without this messed up layout?

ADD REPLYlink written 2.9 years ago by silvie.school0

Above the text box you write in, there should be a number of box icons which you can use to edit the text... Highlight your code and then click the little box with the ones and zeroes in it.

ADD REPLYlink written 2.9 years ago by James Ashmore2.6k

Thanks! It works.

ADD REPLYlink written 2.9 years ago by silvie.school0
0
gravatar for Zaag
2.9 years ago by
Zaag720
Amsterdam
Zaag720 wrote:

Maybe try something like this:

from collections import defaultdict

d = defaultdict(list)


    d[sequence[0]].append(sequence[1:])
ADD COMMENTlink written 2.9 years ago by Zaag720

Thanks for your reply! I've tried your solution and it raises TypeError: first argument must be callable. Now I've done the following:

>    def maak_dictionary(sequence):
>     d = {}
>     for k, v in sequence:
>         d[k].append(v)
>     print(d)

and it raises ValueError: too many values to unpack (expected 2). What can I do? Maybe I should forget about the dictionary and work with a list? But my teacher recommends me to use a dictionary and I need to search through the data in order to find the human sequences which matches a specific regex. I can not ask my teacher for help.

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by silvie.school0
from collections import defaultdict


d = defaultdict(list)


with open("Filename.txt") as f:
    file = f.readlines()
    raw_data = ''
    start_reading = False
    for line in file:
        if line.startswith(">"):
            if 'header' in locals():
                if '[' in header:
                    d[header].append(raw_data) 

            line.strip()
            header = line

        if '[' in line:
            line.strip()
            header += line

        else:
            line.strip()
            raw_data += line

This seems to get it in a dict, but you'll need to do some parsing on the text i guess.

ADD REPLYlink written 2.9 years ago by Zaag720
0
gravatar for lmanohara99
2.9 years ago by
lmanohara9920
Sri Lanka
lmanohara9920 wrote:
try:
    with open("sequence.txt") as f:
      file = f.readlines()
      raw_data = ''
      start_reading = False
      for line in file:
          if line.startswith(">"):
              start_reading = True
          if start_reading:
              raw_data += line
      sequence = raw_data.split(">")
      sequence = sequence[1:]
except IOError:
         print('Some meaningfull message')
         quit()
finally:
         sequenceDict = dict(mouse=sequence)
         print(sequenceDict)

I think, this should be help to you. Please refer this https://docs.python.org/2/tutorial/datastructures.html#dictionaries

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by lmanohara9920

Thanks for your reply! This is the best solution I've seen so far. There is only one problem. I made a test file with 2 records similar to the one shown in the example above. The original file has hundreds of records. The program places everything under the same key. Is there a possibility to place every record under a different key?

ADD REPLYlink written 2.9 years ago by silvie.school0
0
gravatar for lmanohara99
2.9 years ago by
lmanohara9920
Sri Lanka
lmanohara9920 wrote:

In that case, you should follow some code like this, because you could not add keys dynamically with above solution.

   sequenceDict = {}; 
        for index, elem in enumerate(sequence):
                sequenceDict[index] = elem 
        print(sequenceDict)
ADD COMMENTlink written 2.9 years ago by lmanohara9920
0
gravatar for WouterDeCoster
2.9 years ago by
Belgium
WouterDeCoster37k wrote:

You want Frog virus 3 (isolate Goorha) (FV-3) to be the key in your dictionary and the rest the value? Assuming that and assuming that your file is not too big (and as such can be comfortably kept in memory) I have some code you could try...

import sys

def makedict(datalist):
    data = ' '.join(datalist),replace('[','%@%@').replace(']', '%@%@) 
    info = data.split('%@%@')
    return((info[1], [info[0], info[2]]))

with open(sys.argv[1]) as input:
    outdict = {}
    content = [line.strip() for line in input.readlines() if not line == ""]
    tempdata = []
    for line in content:
         if line.startswith('>'):
              if not len(tempdata) = 0:
                   key, value = makedict(tempdata)
                   outdict[key] = value
             else:
                  tempdata = [line,]
        else:
            tempdata.append(line)
   else:
        key. value = makedict(tempdata)
        outdict[key] = value

Notice:

-the list comprehension to properly format the input

-the else clause on the for loop to also convert the last entry

-storing the objects in the templist and emptying this after creating a dict key and value based on it

-in makedict function I replace brackets by something we do not expect in the file at all, which I use subsequently for splitting allowing me to isolate the species name.

  • the function makedict returns a tuple with first the key and second element the rest of the info

Since I do not have your (complete) inputfile I haven't been able to test it, so let me know if something goes wrong which you can't fix.

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by WouterDeCoster37k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1127 users visited in the last hour