Question

[Python] How to convert a list to a dictionary if dict() does not work without using BioPython

0

Entering edit mode

8.0 years ago

silvie.school • 0

Hello everybody,

Sample file:

>sp|Q6GZX2|003R_FRG3G  (438 aa)
Uncharacterized protein 3R.  [Frog virus 3 (isolate Goorha) (FV-3)]
MARPLLGKTSSVRRRLESLSACSIFFFLRKFCQKMASLVFLNSPVYQMSNILLTERRQVDRAMGGSDDDGVMVVALSPSD
FKTVLGSALLAVERDMVHVVPKYLQTPGILHDMLVLLTPIFGEALSVDMSGATDVMVQQIATAGFVDVDPLHSSVSWKDN
VSCPVALLAVSNAVRTMMGQPCQVTLIIDVGTQNILRDLVNLPVEMSGDLQVMAYTKDPLGKVPAVGVSVFDSGSVQKGD
AHSVGAPDGLVSFHTHPVSSAVELNYHAGWPSNVDMSSLLTMKNLMHVVVAEEGLWTMARTLSMQRLTKVLTDAEKDVMR
AAAFNLFLPLNELRVMGTKDSNNKSLKTYFEVFETFTIGALMKHSGVTPTAFVDRRWLDNTIYHMGFIPWGRDMRFVVEY
DLDGTNPFLNTVPTLMSVKRKAKIQEMFDNMVSRMVTS
      2 - 9:          ArpllGKT

Sample code:

 def get_sequence(): 
     try:
         with open("Filename.txt") as f:
             file = f.readlines()
             raw_data = ''
             start_reading = False
             for line in file:
                 if line.startswith(">"):
                     start_reading = True
                 if start_reading:
                     raw_data += line
             sequence = raw_data.split(">") 
             sequence = sequence[1:]       
     except IOError:
         print('Some meaningfull message')
         quit()
     finally:
         print(sequence[0])
         print(sequence[1])
         dict(sequence)
         return sequence

My question is how can I convert the list sequence to a dictionary? It would be really nice if the organism is the key and the value is a list of the other data. The dict() method raises a ValueError.

This is a school assignment, so I'm not allowed to use BioPython.

Thanks in advance!

software error • 10k views

ADD COMMENT • link updated 8.0 years ago by WouterDeCoster 47k • written 8.0 years ago by silvie.school • 0

0

Entering edit mode

How can I post code without this messed up layout?

ADD REPLY • link 8.0 years ago by silvie.school • 0

0

Entering edit mode

Above the text box you write in, there should be a number of box icons which you can use to edit the text... Highlight your code and then click the little box with the ones and zeroes in it.

ADD REPLY • link 8.0 years ago by James Ashmore ★ 3.4k

0

Entering edit mode

Thanks! It works.

ADD REPLY • link 8.0 years ago by silvie.school • 0

score 0 · Answer 1 · 2016-04-16

0

Entering edit mode

8.0 years ago

Zaag ▴ 860

Maybe try something like this:

from collections import defaultdict

d = defaultdict(list)


    d[sequence[0]].append(sequence[1:])

ADD COMMENT • link 8.0 years ago by Zaag ▴ 860

0

Entering edit mode

Thanks for your reply! I've tried your solution and it raises TypeError: first argument must be callable. Now I've done the following:

>    def maak_dictionary(sequence):
>     d = {}
>     for k, v in sequence:
>         d[k].append(v)
>     print(d)

and it raises ValueError: too many values to unpack (expected 2). What can I do? Maybe I should forget about the dictionary and work with a list? But my teacher recommends me to use a dictionary and I need to search through the data in order to find the human sequences which matches a specific regex. I can not ask my teacher for help.

ADD REPLY • link 8.0 years ago by silvie.school • 0

0

Entering edit mode

from collections import defaultdict


d = defaultdict(list)


with open("Filename.txt") as f:
    file = f.readlines()
    raw_data = ''
    start_reading = False
    for line in file:
        if line.startswith(">"):
            if 'header' in locals():
                if '[' in header:
                    d[header].append(raw_data) 

            line.strip()
            header = line

        if '[' in line:
            line.strip()
            header += line

        else:
            line.strip()
            raw_data += line

This seems to get it in a dict, but you'll need to do some parsing on the text i guess.

ADD REPLY • link 8.0 years ago by Zaag ▴ 860

score 0 · Answer 2 · 2016-04-16

0

Entering edit mode

8.0 years ago

lmanohara99 ▴ 20

try:
    with open("sequence.txt") as f:
      file = f.readlines()
      raw_data = ''
      start_reading = False
      for line in file:
          if line.startswith(">"):
              start_reading = True
          if start_reading:
              raw_data += line
      sequence = raw_data.split(">")
      sequence = sequence[1:]
except IOError:
         print('Some meaningfull message')
         quit()
finally:
         sequenceDict = dict(mouse=sequence)
         print(sequenceDict)

I think, this should be help to you. Please refer this https://docs.python.org/2/tutorial/datastructures.html#dictionaries

ADD COMMENT • link 8.0 years ago by lmanohara99 ▴ 20

0

Entering edit mode

Thanks for your reply! This is the best solution I've seen so far. There is only one problem. I made a test file with 2 records similar to the one shown in the example above. The original file has hundreds of records. The program places everything under the same key. Is there a possibility to place every record under a different key?

ADD REPLY • link 8.0 years ago by silvie.school • 0

score 0 · Answer 3 · 2016-04-16

0

Entering edit mode

8.0 years ago

lmanohara99 ▴ 20

In that case, you should follow some code like this, because you could not add keys dynamically with above solution.

   sequenceDict = {}; 
        for index, elem in enumerate(sequence):
                sequenceDict[index] = elem 
        print(sequenceDict)

ADD COMMENT • link 8.0 years ago by lmanohara99 ▴ 20

score 0 · Answer 4 · 2016-04-16

You want Frog virus 3 (isolate Goorha) (FV-3) to be the key in your dictionary and the rest the value? Assuming that and assuming that your file is not too big (and as such can be comfortably kept in memory) I have some code you could try...

import sys

def makedict(datalist):
    data = ' '.join(datalist),replace('[','%@%@').replace(']', '%@%@) 
    info = data.split('%@%@')
    return((info[1], [info[0], info[2]]))

with open(sys.argv[1]) as input:
    outdict = {}
    content = [line.strip() for line in input.readlines() if not line == ""]
    tempdata = []
    for line in content:
         if line.startswith('>'):
              if not len(tempdata) = 0:
                   key, value = makedict(tempdata)
                   outdict[key] = value
             else:
                  tempdata = [line,]
        else:
            tempdata.append(line)
   else:
        key. value = makedict(tempdata)
        outdict[key] = value

Notice:

-the list comprehension to properly format the input

-the else clause on the for loop to also convert the last entry

-storing the objects in the templist and emptying this after creating a dict key and value based on it

-in makedict function I replace brackets by something we do not expect in the file at all, which I use subsequently for splitting allowing me to isolate the species name.

the function makedict returns a tuple with first the key and second element the rest of the info

Since I do not have your (complete) inputfile I haven't been able to test it, so let me know if something goes wrong which you can't fix.