python
2
0
Entering edit mode
17 months ago

I have this DNA sequence:

dna_seq_edited = CUGAACUSCACUGECAUUCA


and I want to cut all letters before "S" in one line using if and loop conditions

I make this script:

#Before_S = True
#for i in range(len(dna_seq)):
#    if dna_seq[i] == 'S':
#        Before_S = False
#        continue
#    if Before_S:enter code here
#        print(dna_seq[i])


But I want to make this whole script in a comprehension method ex :

firstexon = [my script]


How can I do that?

python • 543 views
1
Entering edit mode
17 months ago

Hello, can you update the title of the post to make it more explicit please, like : "Removing characters in a string in python", or so.

This question has been answered many times and you have many ways to resolve it.

You do not have to reinvent the wheel, you can replace by nothing all character before S using a regular expression (regex)

See the docs

Considering you only have one S in your string. Something like (not tested) :

import re
s = "CUGAACUSCACUGECAUUCA"
replaced = re.sub('.*S, '', s)
print replaced


Another solution, again if you only have one S, you can split your string using the S character. The following will split your string in 2, you want the second part of the string.

s = "CUGAACUSCACUGECAUUCA"
replaced = s.split('S')[1]
print(replaced)


But if you want to stick to your way for learning purpose, you can use a for loop over your array character not recording the character till you find a S, from then append the downstream characters into another array, your edited array. Good luck.

0
Entering edit mode
17 months ago
Dunois ★ 2.0k

Your phrasing is unclear. If a list comprehension solution is what you're after, are you looking for something like this?

#Python 3.9.1

#Generating some random strings containing 'A', 'T', 'G', 'C' separated by a single 'S'.
import random
def strgen():
return(''.join(random.choices(['A', 'U', 'C', 'G'], k = random.choice([4,5,6,7,8]))))
seqs = [strgen()+'S'+strgen() for i in range(10)]

print(seqs)
# ['UCGCSCGACGAUA', 'CCUGGGSAGAA', 'GUGCSCCUUA', 'GUGUACASUUCUGUG', 'UCGUSGCCUG', 'GACASAACGC', 'UAGUSGUCCC', 'GCGUSGCGGCACA', 'UAGCUAUSCUGUAUA', 'AGUUSCCCGGCGU']

#Splitting using list comprehension.
seqs_split = [x.split('S') for x in seqs]

print(seqs_split)
# [['UCGC', 'CGACGAUA'], ['CCUGGG', 'AGAA'], ['GUGC', 'CCUUA'], ['GUGUACA', 'UUCUGUG'], ['UCGU', 'GCCUG'], ['GACA', 'AACGC'], ['UAGU', 'GUCCC'], ['GCGU', 'GCGGCACA'], ['UAGCUAU', 'CUGUAUA'], ['AGUU', 'CCCGGCGU']]


Modifying [x.split('S') for x in seqs] to [x.split('S')[0] for x in seqs] or [x.split('S')[1] for x in seqs] will yield just the former or latter portions of the strings w.r.t. S respectively.