Question: I am having a problem with getting all the list of amino acid values from dict in python!
0
gravatar for macadamopetrus
15 months ago by
macadamopetrus0 wrote:

Dear all,

I am having a problem with a code I wrote. I made a dictionary in python. This dictionary contains the keys which are amino acids (single letters code) and then I added values which give how many times this code is present. As a final step, I read a fasta file and make a code that should be able to give the values for the equivalent seq present in the file. However, as an answer, I just get only the last sequence with values and nothing for the other 4 sequences (They are 5 in total).

aa_code = {
'A': 4, 'C': 2, 'D': 2, 'E': 2, 'F': 2,
'G': 4, 'H': 2, 'I': 3, 'K': 2, 'L': 6, 
'M': 1, 'N': 2, 'P': 4, 'Q': 2, 'R': 6, 
'S': 6, 'T': 4, 'V': 4, 'W': 1, 'Y': 2,
}
ADD COMMENTlink modified 15 months ago by Joe16k • written 15 months ago by macadamopetrus0
from Bio import SeqIO
for protein_seq in SeqIO.parse("protein.fasta","fasta"):
      for i in list(protein_seq):
         combinations = list()
         for value in list(protein_seq):
              combinations.append(aa_code[value])
      print(combinations)

This is the right answer!!

[4, 2, 2, 6]

[2, 2, 2, 4, 4]

[6, 4, 2, 4]

[6, 1, 6, 1, 3, 2, 2, 4, 2, 2, 2]

[2, 2, 4, 4, 4, 1, 4, 2, 2, 4, 2]

ADD REPLYlink modified 15 months ago • written 15 months ago by macadamopetrus0
1
gravatar for finswimmer
15 months ago by
finswimmer13k
Germany
finswimmer13k wrote:

Hello macadamopetrus ,

I don't understand what this dictionary is for.

But your print(combinations) is outside the first for loop, and therefore printed after finishing this loop. You have to put it inside the loop.

fin swimmer

ADD COMMENTlink written 15 months ago by finswimmer13k

No this doesn't work as it gives a lot of lists. But after I add the print(combination) after the first for loop. It gives the strings but not in order as I expect.

ADD REPLYlink modified 15 months ago • written 15 months ago by macadamopetrus0
0
gravatar for Joe
15 months ago by
Joe16k
United Kingdom
Joe16k wrote:

Does this do what you’re trying to do? (I changed the printing format slightly for testing)

from Bio import SeqIO

aa_code = { 'A': 4, 'C': 2, 'D': 2, 'E': 2, 'F': 2, 'G': 4, 'H': 2, 'I': 3, 'K': 2, 'L': 6, 'M': 1, 'N': 2, 'P': 4, 'Q': 2, 'R': 6, 'S': 6, 'T': 4, 'V': 4, 'W': 1, 'Y': 2}

for protein_seq in SeqIO.parse("testproteins.fa", "fasta"):
    x = protein_seq.seq
    combinations = [aa_code[value] for value in x]
    print("{} -> {}".format(x, combinations))

For your example sequences above I get:

VFHL -> [4, 2, 2, 6]
YKDTT -> [2, 2, 2, 4, 4]
RTKT -> [6, 4, 2, 4]
RMLMIDYVCFQ -> [6, 1, 6, 1, 3, 2, 2, 4, 2, 2, 2]
DKGTPMAEQTY -> [2, 2, 4, 4, 4, 1, 4, 2, 2, 4, 2]

Old answer below, misunderstood OP's objective.

If you’re trying to do what I think you’re trying to do, do this instead:

from Bio import SeqIO
from collections import Counter
import sys

for r in SeqIO.parse(sys.argv[1], 'fasta'):
    print("{0}\n{1}".formatr.id, Counter(r.seq)))

                           ^
#  Beware there should be a ( at this position (known bug of the forum)
ADD COMMENTlink modified 15 months ago • written 15 months ago by Joe16k

No this doesn't work. Because what I am trying to do is take the values from the dictionary that are equivalent to the keys in my list (called "x"). And make a list of the values and print those. But I think there is some problem with the for loop. Because the code that I made does work. If this can print the last element of the list with values then why not all the previous 4.

ADD REPLYlink written 15 months ago by macadamopetrus0

I’m not clear on what your dictionary is for?

They are “scores” for different amino acids of interest or something? In that case Fin swimmer’s suggestion should be the answer you need. You are only printing a single value of the combinations list outside of the loop.

EDIT: moved code up in to answer body to keep thread organised.

ADD REPLYlink modified 15 months ago • written 15 months ago by Joe16k

Yes, I tried it with shifting print(combinations) after the first for loop in the code. This gives something like this. The first list should be printed at last.

[2, 2, 4, 4, 4, 1, 4, 2, 2, 4, 2]

[4, 2, 2, 6]

[2, 2, 2, 4, 4]

[6, 4, 2, 4]

[6, 1, 6, 1, 3, 2, 2, 4, 2, 2, 2]

But it seems that it still doesn't print like I expect.

[4, 2, 2, 6]

[2, 2, 2, 4, 4]

[6, 4, 2, 4]

[6, 1, 6, 1, 3, 2, 2, 4, 2, 2, 2]

[2, 2, 4, 4, 4, 1, 4, 2, 2, 4, 2]

ADD REPLYlink modified 15 months ago • written 15 months ago by macadamopetrus0

Thank u. But why did not my code worked? I did write it logically.

ADD REPLYlink written 15 months ago by macadamopetrus0

You need to show us what you changed the code to, but it looks like you kept 2 print statements instead of just moving the last statement inside the loop as we said.

ADD REPLYlink written 15 months ago by Joe16k
from Bio import SeqIO
for protein_seq in SeqIO.parse("protein.fasta","fasta"):
     print(combinations)
     x = list(protein_seq.seq)
     combinations = list()
     for value in x:
            combinations.append(aa_code[value])

This is the answer after adding the print(combination) under the first for loop. But the first list should be added at last.

[2, 2, 4, 4, 4, 1, 4, 2, 2, 4, 2]

[4, 2, 2, 6]

[2, 2, 2, 4, 4]

[6, 4, 2, 4]

[6, 1, 6, 1, 3, 2, 2, 4, 2, 2, 2]

ADD REPLYlink modified 15 months ago • written 15 months ago by macadamopetrus0
2

That print statement does nothing because combinations isn’t defined till afterwards. If you’re running this inside IDLE, it’s probably retaining the last value from your previous run of the code. I’d advise saving it as a script and running from the command line if you aren’t already. You'd need to place print(combinations) after you define what combinations is (i.e. after combinations = list().

Your code as you have written it should look more like:

from Bio import SeqIO
for protein_seq in SeqIO.parse("protein.fasta","fasta"):
    x = list(protein_seq.seq)
    combinations = list()
    for value in x:
        combinations.append(aa_code[value])
    print(combinations)

The print(combinations) call still needs to be within the outer loop, but should be level with the inner loop which is assigning all the dictionary keys. That way, it will create, and print, a list that corresponds to each of the sequences in the file.

Some other general points:

  • You don't need to convert the sequence to a list() as strings are inherently iterable in python. for i in seq and for i in list(seq) are essentially equivalent, so the line x = list(protein_seq.seq) is redundant.
  • Check your intentation for the second loop. It may just be how you've copied it on to the forum, but it looks over-indented to me.
ADD REPLYlink modified 15 months ago • written 15 months ago by Joe16k

Thank u, do you know how many times I can add a for loops in a single function?

ADD REPLYlink written 15 months ago by macadamopetrus0
2

Do you mean nested loops? (A loop in a loop in a loop ....)

I don't know if there is a limit. But if you hit one, than something is definitely wrong with your approach to solving a problem and you should rethink it. I would guess a depth of 5 should be some kind of maximum you should use.

fin swimmer

ADD REPLYlink written 15 months ago by finswimmer13k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1291 users visited in the last hour