Question

how to convert a protein sequence into a list of characters

0

Entering edit mode

23 months ago

Debut ▴ 20

I would like to convert a protein sequence and a sequence containing mutations (the mutations between the protein sequences and the reference sequence) into lists. In order to be able to compare the two lists. for example :

Seq           GEDAPEEMN
----------
Mut                   LM
----------
output seq     ['G',    'E', 'D', 'A', 'P','E', 'E', 'M', 'N'] (list)
----------
output mut [' L  ', 'M ', '    ', '', '',' ', ' ', ' ', ' ']

I made this code but it does not work:

lignes=myFile.readlines()
for ligne in lignes :
    split_tableau= ligne.split(",")
    seq= split_tableau[4]
    mut= split_tableau[6]

    for caraS in seq :
        caraSeq= caraS.split()
    for caraM in mut :
        caraMut= caraM.split(
        print(caraMut)

python list • 1.5k views

ADD COMMENT • link updated 23 months ago by Jeremy ▴ 880 • written 23 months ago by Debut ▴ 20

0

Entering edit mode

I reformatted your post - maybe lost some of the white-space formatting you'd added in, sorry about that. Can you please check again so your Seq Mut etc are properly formatted? Also, do the extra white spaces in the output seq and output mut blocks have any significance?

ADD REPLY • link 23 months ago by Ram 43k

0

Entering edit mode

Thanks for your answer. mut: this is a sequence that highlights the mutations that the sequence has in relation to a reference sequence (I don't need the reference sequence for my code that's why it doesn't appear). The spaces show the matches between my sequence and the reference sequence. for example in the first and second position it's mutations and the rest where there are spaces is to say that it's match it's the same amino acid that appears

ADD REPLY • link 23 months ago by Debut ▴ 20

0

Entering edit mode

So in your example, the "L" and "M" are supposed to match with the "G" and "E"?

its not clear at all where the mutations are supposed to come in or what governs their position.

ADD REPLY • link 23 months ago by Joe 21k

0

Entering edit mode

Thank you for your feedback, it's two sequences: mut is a sequence that shows the mutations of the protein sequence compared to the reference sequence (I didn't present in my code the reference sequence because I don't need it). and seq is the protein sequence that has been compared with the reference sequence I want to put the two sequences in list format and compare them with their indexes in order to have the position of the mutations for example G in L in position 0. And when there is a space as in position 3, 4, .... that is to say that the protein sequence and the reference sequence there is a match (the reference sequence is not illustrated in the code because I don't need it) so first I want to turn them into a list but the problem with seq (mut), each character is in a list alone

Translated with www.DeepL.com/Translator (free version)

ADD REPLY • link 23 months ago by Debut ▴ 20

0

Entering edit mode

You need to stop opening new questions. This is the 3rd post I've seen of yours in the last 2 or 3 days, all related to the same topic.

ADD REPLY • link 23 months ago by Joe 21k

score 0 · Answer 1 · 2022-05-19

0

Entering edit mode

23 months ago

Jeremy ▴ 880

You can use the following code in Python.

newseq = list(Seq)
for letter in Mut:
    if letter == ' ':
       letter == '-'
newmut = list(Mut)

ADD COMMENT • link 23 months ago by Jeremy ▴ 880

1

Entering edit mode

Thank you for your answer. When I display on the console I have each character in a list alone but I want all the characters of the senescences in the same list. I have several sequences so I need a for loop isn't it ?

with open ('data2.csv', 'r') as myFile: lignes=monFichier.readlines() for ligne in lignes : split_tableau= ligne.split(",") seq= split_tableau[4] mut= split_tableau[6]

    #print(seq)
    #print(mut)
    for s in seq :
        caraSeq= list(s)
    for m in mut :
        caraMut= list(m)
        print(caraMut)

ADD REPLY • link 23 months ago by Debut ▴ 20

0

Entering edit mode

OK, I thought your sequences were strings. It looks like you're starting with a CSV file. It might be helpful if you could show the first 5 or 10 rows of your CSV file and what you would like the final output to look like.

ADD REPLY • link 23 months ago by Jeremy ▴ 880

score 0 · Answer 2 · 2022-05-20

Maybe this solves the question but I'm still not sure:

# Make lists from strings
>>> seq = list("GEDAPEEMN")
>>> seq
['G', 'E', 'D', 'A', 'P', 'E', 'E', 'M', 'N']
>>> mut = list ("LM")
>>> mut
['L', 'M']

# Pad the mutation list
mut += [' '] * (len(seq) - len(mut))

# Display
>>> print(seq,mut, sep="\n")
['G', 'E', 'D', 'A', 'P', 'E', 'E', 'M', 'N']
['L', 'M', ' ', ' ', ' ', ' ', ' ', ' ', ' ']

This will only work for right hand padding though. Without more information we can't write a general solution because we know nothing about how the mutation coordinates are determined.