Question

python

0

Entering edit mode

2.9 years ago

FadyNabil ▴ 20

I have this DNA sequence:

dna_seq_edited = CUGAACUSCACUGECAUUCA

and I want to cut all letters before "S" in one line using if and loop conditions

I make this script:

#Before_S = True
#for i in range(len(dna_seq)):
#    if dna_seq[i] == 'S':
#        Before_S = False
#        continue
#    if Before_S:`enter code here`
#        print(dna_seq[i])

But I want to make this whole script in a comprehension method ex :

firstexon = [my script]

How can I do that?

python • 924 views

ADD COMMENT • link updated 2.9 years ago by Ram 43k • written 2.9 years ago by FadyNabil ▴ 20

score 1 · Answer 1 · 2021-05-10

Hello, can you update the title of the post to make it more explicit please, like : "Removing characters in a string in python", or so.

This question has been answered many times and you have many ways to resolve it.

You do not have to reinvent the wheel, you can replace by nothing all character before S using a regular expression (regex)

See the docs

Considering you only have one S in your string. Something like (not tested) :

import re
s = "CUGAACUSCACUGECAUUCA"
replaced = re.sub('.*S, '', s)
print replaced

Another solution, again if you only have one S, you can split your string using the S character. The following will split your string in 2, you want the second part of the string.

s = "CUGAACUSCACUGECAUUCA"
replaced = s.split('S')[1]
print(replaced)

But if you want to stick to your way for learning purpose, you can use a for loop over your array character not recording the character till you find a S, from then append the downstream characters into another array, your edited array. Good luck.

score 0 · Answer 2 · 2021-05-10

Your phrasing is unclear. If a list comprehension solution is what you're after, are you looking for something like this?

#Python 3.9.1

#Generating some random strings containing 'A', 'T', 'G', 'C' separated by a single 'S'. 
import random
def strgen():
    return(''.join(random.choices(['A', 'U', 'C', 'G'], k = random.choice([4,5,6,7,8]))))
seqs = [strgen()+'S'+strgen() for i in range(10)]

print(seqs)
# ['UCGCSCGACGAUA', 'CCUGGGSAGAA', 'GUGCSCCUUA', 'GUGUACASUUCUGUG', 'UCGUSGCCUG', 'GACASAACGC', 'UAGUSGUCCC', 'GCGUSGCGGCACA', 'UAGCUAUSCUGUAUA', 'AGUUSCCCGGCGU']

#Splitting using list comprehension.
seqs_split = [x.split('S') for x in seqs]

print(seqs_split)
# [['UCGC', 'CGACGAUA'], ['CCUGGG', 'AGAA'], ['GUGC', 'CCUUA'], ['GUGUACA', 'UUCUGUG'], ['UCGU', 'GCCUG'], ['GACA', 'AACGC'], ['UAGU', 'GUCCC'], ['GCGU', 'GCGGCACA'], ['UAGCUAU', 'CUGUAUA'], ['AGUU', 'CCCGGCGU']]

Modifying [x.split('S') for x in seqs] to [x.split('S')[0] for x in seqs] or [x.split('S')[1] for x in seqs] will yield just the former or latter portions of the strings w.r.t. S respectively.