how to convert a protein sequence into a list of characters
2
0
Entering edit mode
2.5 years ago
Debut ▴ 20

I would like to convert a protein sequence and a sequence containing mutations (the mutations between the protein sequences and the reference sequence) into lists. In order to be able to compare the two lists. for example :

Seq           GEDAPEEMN
----------
Mut                   LM
----------
output seq     ['G',    'E', 'D', 'A', 'P','E', 'E', 'M', 'N'] (list)
----------
output mut [' L  ', 'M ', '    ', '', '',' ', ' ', ' ', ' ']

I made this code but it does not work:

lignes=myFile.readlines()
for ligne in lignes :
    split_tableau= ligne.split(",")
    seq= split_tableau[4]
    mut= split_tableau[6]

    for caraS in seq :
        caraSeq= caraS.split()
    for caraM in mut :
        caraMut= caraM.split(
        print(caraMut)
python list • 1.8k views
ADD COMMENT
0
Entering edit mode

I reformatted your post - maybe lost some of the white-space formatting you'd added in, sorry about that. Can you please check again so your Seq Mut etc are properly formatted? Also, do the extra white spaces in the output seq and output mut blocks have any significance?

ADD REPLY
0
Entering edit mode

Thanks for your answer. mut: this is a sequence that highlights the mutations that the sequence has in relation to a reference sequence (I don't need the reference sequence for my code that's why it doesn't appear). The spaces show the matches between my sequence and the reference sequence. for example in the first and second position it's mutations and the rest where there are spaces is to say that it's match it's the same amino acid that appears

ADD REPLY
0
Entering edit mode

So in your example, the "L" and "M" are supposed to match with the "G" and "E"?

its not clear at all where the mutations are supposed to come in or what governs their position.

ADD REPLY
0
Entering edit mode

Thank you for your feedback, it's two sequences: mut is a sequence that shows the mutations of the protein sequence compared to the reference sequence (I didn't present in my code the reference sequence because I don't need it). and seq is the protein sequence that has been compared with the reference sequence I want to put the two sequences in list format and compare them with their indexes in order to have the position of the mutations for example G in L in position 0. And when there is a space as in position 3, 4, .... that is to say that the protein sequence and the reference sequence there is a match (the reference sequence is not illustrated in the code because I don't need it) so first I want to turn them into a list but the problem with seq (mut), each character is in a list alone

Translated with www.DeepL.com/Translator (free version)

ADD REPLY
0
Entering edit mode

You need to stop opening new questions. This is the 3rd post I've seen of yours in the last 2 or 3 days, all related to the same topic.

ADD REPLY
0
Entering edit mode
2.5 years ago
Jeremy ▴ 930

You can use the following code in Python.

newseq = list(Seq)
for letter in Mut:
    if letter == ' ':
       letter == '-'
newmut = list(Mut)
ADD COMMENT
1
Entering edit mode

Thank you for your answer. When I display on the console I have each character in a list alone but I want all the characters of the senescences in the same list. I have several sequences so I need a for loop isn't it ?


with open ('data2.csv', 'r') as myFile: lignes=monFichier.readlines() for ligne in lignes : split_tableau= ligne.split(",") seq= split_tableau[4] mut= split_tableau[6]

    #print(seq)
    #print(mut)
    for s in seq :
        caraSeq= list(s)
    for m in mut :
        caraMut= list(m)
        print(caraMut)
ADD REPLY
0
Entering edit mode

OK, I thought your sequences were strings. It looks like you're starting with a CSV file. It might be helpful if you could show the first 5 or 10 rows of your CSV file and what you would like the final output to look like.

ADD REPLY
0
Entering edit mode
2.5 years ago
Joe 21k

Maybe this solves the question but I'm still not sure:

# Make lists from strings
>>> seq = list("GEDAPEEMN")
>>> seq
['G', 'E', 'D', 'A', 'P', 'E', 'E', 'M', 'N']
>>> mut = list ("LM")
>>> mut
['L', 'M']

# Pad the mutation list
mut += [' '] * (len(seq) - len(mut))

# Display
>>> print(seq,mut, sep="\n")
['G', 'E', 'D', 'A', 'P', 'E', 'E', 'M', 'N']
['L', 'M', ' ', ' ', ' ', ' ', ' ', ' ', ' ']

This will only work for right hand padding though. Without more information we can't write a general solution because we know nothing about how the mutation coordinates are determined.

ADD COMMENT

Login before adding your answer.

Traffic: 2343 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6