I am having a problem with getting all the list of amino acid values from dict in python!
2
0
Entering edit mode
5.3 years ago

Dear all,

I am having a problem with a code I wrote. I made a dictionary in python. This dictionary contains the keys which are amino acids (single letters code) and then I added values which give how many times this code is present. As a final step, I read a fasta file and make a code that should be able to give the values for the equivalent seq present in the file. However, as an answer, I just get only the last sequence with values and nothing for the other 4 sequences (They are 5 in total).

aa_code = {
'A': 4, 'C': 2, 'D': 2, 'E': 2, 'F': 2,
'G': 4, 'H': 2, 'I': 3, 'K': 2, 'L': 6, 
'M': 1, 'N': 2, 'P': 4, 'Q': 2, 'R': 6, 
'S': 6, 'T': 4, 'V': 4, 'W': 1, 'Y': 2,
}
dictionary python biopython protein • 3.7k views
ADD COMMENT
0
Entering edit mode
from Bio import SeqIO
for protein_seq in SeqIO.parse("protein.fasta","fasta"):
      for i in list(protein_seq):
         combinations = list()
         for value in list(protein_seq):
              combinations.append(aa_code[value])
      print(combinations)

This is the right answer!!

[4, 2, 2, 6]

[2, 2, 2, 4, 4]

[6, 4, 2, 4]

[6, 1, 6, 1, 3, 2, 2, 4, 2, 2, 2]

[2, 2, 4, 4, 4, 1, 4, 2, 2, 4, 2]

ADD REPLY
1
Entering edit mode
5.3 years ago

Hello macadamopetrus ,

I don't understand what this dictionary is for.

But your print(combinations) is outside the first for loop, and therefore printed after finishing this loop. You have to put it inside the loop.

fin swimmer

ADD COMMENT
0
Entering edit mode

No this doesn't work as it gives a lot of lists. But after I add the print(combination) after the first for loop. It gives the strings but not in order as I expect.

ADD REPLY
0
Entering edit mode
5.3 years ago
Joe 21k

Does this do what you’re trying to do? (I changed the printing format slightly for testing)

from Bio import SeqIO

aa_code = { 'A': 4, 'C': 2, 'D': 2, 'E': 2, 'F': 2, 'G': 4, 'H': 2, 'I': 3, 'K': 2, 'L': 6, 'M': 1, 'N': 2, 'P': 4, 'Q': 2, 'R': 6, 'S': 6, 'T': 4, 'V': 4, 'W': 1, 'Y': 2}

for protein_seq in SeqIO.parse("testproteins.fa", "fasta"):
    x = protein_seq.seq
    combinations = [aa_code[value] for value in x]
    print("{} -> {}".format(x, combinations))

For your example sequences above I get:

VFHL -> [4, 2, 2, 6]
YKDTT -> [2, 2, 2, 4, 4]
RTKT -> [6, 4, 2, 4]
RMLMIDYVCFQ -> [6, 1, 6, 1, 3, 2, 2, 4, 2, 2, 2]
DKGTPMAEQTY -> [2, 2, 4, 4, 4, 1, 4, 2, 2, 4, 2]

Old answer below, misunderstood OP's objective.

If you’re trying to do what I think you’re trying to do, do this instead:

from Bio import SeqIO
from collections import Counter
import sys

for r in SeqIO.parse(sys.argv[1], 'fasta'):
    print("{0}\n{1}".formatr.id, Counter(r.seq)))

                           ^
#  Beware there should be a ( at this position (known bug of the forum)
ADD COMMENT
0
Entering edit mode

No this doesn't work. Because what I am trying to do is take the values from the dictionary that are equivalent to the keys in my list (called "x"). And make a list of the values and print those. But I think there is some problem with the for loop. Because the code that I made does work. If this can print the last element of the list with values then why not all the previous 4.

ADD REPLY
0
Entering edit mode

I’m not clear on what your dictionary is for?

They are “scores” for different amino acids of interest or something? In that case Fin swimmer’s suggestion should be the answer you need. You are only printing a single value of the combinations list outside of the loop.

EDIT: moved code up in to answer body to keep thread organised.

ADD REPLY
0
Entering edit mode

Yes, I tried it with shifting print(combinations) after the first for loop in the code. This gives something like this. The first list should be printed at last.

[2, 2, 4, 4, 4, 1, 4, 2, 2, 4, 2]

[4, 2, 2, 6]

[2, 2, 2, 4, 4]

[6, 4, 2, 4]

[6, 1, 6, 1, 3, 2, 2, 4, 2, 2, 2]

But it seems that it still doesn't print like I expect.

[4, 2, 2, 6]

[2, 2, 2, 4, 4]

[6, 4, 2, 4]

[6, 1, 6, 1, 3, 2, 2, 4, 2, 2, 2]

[2, 2, 4, 4, 4, 1, 4, 2, 2, 4, 2]

ADD REPLY
0
Entering edit mode

Thank u. But why did not my code worked? I did write it logically.

ADD REPLY
0
Entering edit mode

You need to show us what you changed the code to, but it looks like you kept 2 print statements instead of just moving the last statement inside the loop as we said.

ADD REPLY
0
Entering edit mode
from Bio import SeqIO
for protein_seq in SeqIO.parse("protein.fasta","fasta"):
     print(combinations)
     x = list(protein_seq.seq)
     combinations = list()
     for value in x:
            combinations.append(aa_code[value])

This is the answer after adding the print(combination) under the first for loop. But the first list should be added at last.

[2, 2, 4, 4, 4, 1, 4, 2, 2, 4, 2]

[4, 2, 2, 6]

[2, 2, 2, 4, 4]

[6, 4, 2, 4]

[6, 1, 6, 1, 3, 2, 2, 4, 2, 2, 2]

ADD REPLY
2
Entering edit mode

That print statement does nothing because combinations isn’t defined till afterwards. If you’re running this inside IDLE, it’s probably retaining the last value from your previous run of the code. I’d advise saving it as a script and running from the command line if you aren’t already. You'd need to place print(combinations) after you define what combinations is (i.e. after combinations = list().

Your code as you have written it should look more like:

from Bio import SeqIO
for protein_seq in SeqIO.parse("protein.fasta","fasta"):
    x = list(protein_seq.seq)
    combinations = list()
    for value in x:
        combinations.append(aa_code[value])
    print(combinations)

The print(combinations) call still needs to be within the outer loop, but should be level with the inner loop which is assigning all the dictionary keys. That way, it will create, and print, a list that corresponds to each of the sequences in the file.

Some other general points:

  • You don't need to convert the sequence to a list() as strings are inherently iterable in python. for i in seq and for i in list(seq) are essentially equivalent, so the line x = list(protein_seq.seq) is redundant.
  • Check your intentation for the second loop. It may just be how you've copied it on to the forum, but it looks over-indented to me.
ADD REPLY
0
Entering edit mode

Thank u, do you know how many times I can add a for loops in a single function?

ADD REPLY
2
Entering edit mode

Do you mean nested loops? (A loop in a loop in a loop ....)

I don't know if there is a limit. But if you hit one, than something is definitely wrong with your approach to solving a problem and you should rethink it. I would guess a depth of 5 should be some kind of maximum you should use.

fin swimmer

ADD REPLY

Login before adding your answer.

Traffic: 2054 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6