Question: (Closed) How do I count a number of ("-") in a reference string and remove it in a second string?
0
gravatar for projetoic
23 days ago by
projetoic0
projetoic0 wrote:

When counting the number of characters at the beginning of a string or the end of a string, the result is 0. What do I do?

f = open('Denv4-X-gb_AY947539.txt', 'r') 
z = f.read()
e = sum(1 for _ in it.takewhile(lambda c: c == '-', z)) 
y = sum(1 for_ in it.takewhile(lambda c: c == '-', reversed(z)))
print(e,y)

Output:

> 0

The file is being read normally ....

What's in the file:

>lcl|NC_002640.1_cds_NP_073286.1_1 [gene=POLY] [locus_tag=DV4_gp1] [db_xref=GeneID:5075729] [protein=polyprotein] [protein_id=NP_073286.1] [location=102..10265] [gbkey=CDS]
------------------------------------------------------------
---------------------------------atgaaccaacgaaaaaaggtggttaga             
ccacctttcaatatgctgaaacgcgagagaaaccgcgtatcaacccctcaagggttggtg
aagagattctcaaccggacttttttctgggaaaggacccttacggatggtgctagcattc
atcacgtttttgcgagtcctttccatcccaccaacagcagggattctgaagagatgggga
cagttgaagaaaaataaggccatcaagatactgattggattcaggaaggagataggccgc
------------------------------------------------------------
>gb:AY947539|Organism:Dengue virus 4|Strain Name:H241|Segment:null|Subtype:4|Host:Human
ggtcgtgtggaccgacaaggacagttccaaatcggaagcttgcttaacacagttctaaca
gtttgtttagatagagagcagatctctggaaaaatgaaccaacgaaaaaaggtggttaga
ccacctttcaatatgctgaaacgcgagagaaaccgcgtatcaacccctcaagggttggtg
aagagattctcaaccggacttttttccgggaaaggacccttacggatggtgctagcattc
atcacgtttttgcgagtcctttccatcccaccaacagcagggattctgaaaagatgggga
cagttgaagaaaaacaaggccatcaaaatactgactggattcaggaaggagataggccgc
atgctgaacatcttgaatggaagaaaaaggtcaacaatgacattgctgtgcttgattccc

I would like to know how to print the "-" number at the beginning and end of the first text:

before starting

>lcl|NC_002640.1_cds_NP_073286.1_1 [gene=POLY] [locus_tag=DV4_gp1] [db_xref=GeneID:5075729][protein=polyprotein] [protein_id=NP_073286.1] [location=102..10265] [gbkey=CDS]

Blockquote

>gb:AY947539|Organism:Dengue virus 4|Strain Name:H241|Segment:null|Subtype:4|Host:Human.

.

f = open('Denv4-X-gb_AY947539.txt.txt', 'r')
con = f.readlines()
con = [i.strip() for i in con]
length = len(con[0].split(" ")[1])
result = f'{con[1].split(" ")[0]} {con[1].split(" ")[1][length:]}'
print(result)
f.close()

f = open('file.txt', 'a')
f.write(f'\n{result}')
sequence alignment genome • 163 views
ADD COMMENTlink modified 23 days ago by Jorge Amigo12k • written 23 days ago by projetoic0

Can you please tidy your post? Please use the 101 010 button, not the quotation one:

ADD REPLYlink written 23 days ago by Kevin Blighe66k

I have done the changes

ADD REPLYlink written 23 days ago by projetoic0

Closed.

See How can I count the symbols in a given string / text and, as a result of that count, remove the characters

ADD REPLYlink modified 23 days ago • written 23 days ago by Kevin Blighe66k
2
gravatar for Jorge Amigo
23 days ago by
Jorge Amigo12k
Santiago de Compostela, Spain
Jorge Amigo12k wrote:

If a perl solution can be considered, here are my 2 cents:

perl -pe 'if (/^>/) { $. > 1 and print "\n" } else { chomp }' sequences.fa \
| perl -pe '/^(-*)\w+(-*)$/ and printf "%s %s\n", length($1), length($2)'

The first perl section linearizes the fasta file, and the second perl section writes the number of "-" characters at the beginning and at the end of the sequence.

ADD COMMENTlink modified 23 days ago • written 23 days ago by Jorge Amigo12k
import itertools as it

f = open('Denv4-X-gb_AY947539.txt', 'r') z = f.read() count_inicio =
sum(map(lambda x : 1 if '-' in x else 0, z))  count_fim =
sum(map(lambda x : 1 if '-' in x else 0, reversed(z))) 
print(count_inicio, count_fim)

I tried this solution in Python, but the output is 459,459

For example I need to take the sequence lcl | NC_002640.1_cds_NP_073286.1_1> --- AATG-GG ---- and count the number of "-" at the beginning and end

And then cut into Myseq1 gb: AY947539 | Organism: Dengue virus 4 | GGGAATG-GGAAAA characters according to the amount of "-"

TALE 3 "-" in Myseq start and 3 at the end 4 ... So the output I want is AATF-GG. But first I need to make this "-" count from the beginning and the end.

How do I count symbols in a given string / text and as a result of that count remove characters from another string / text in the same file?

ADD REPLYlink modified 22 days ago by Jorge Amigo12k • written 23 days ago by projetoic0

Problem again with your code

ADD REPLYlink written 23 days ago by Kevin Blighe66k

Please! Help me! I need to solve this problem and I am not getting a solution to my problem

ADD REPLYlink written 23 days ago by projetoic0

Sua atitude me deixa desanimado [por tentar ajudar] e frustrado.

Você também acabou de postar exatamente a mesma pergunta novamente: How can I count the symbols in a given string / text and, as a result of that count, remove the characters

Como você realmente não melhorou nada na nova questão, também nada está sendo resolvido aí.

Esta será encerrada.

ADD REPLYlink written 23 days ago by Kevin Blighe66k

This is not a Python solution, but Perl, and it's meant to be run in a unix/linux command line. Also, the sequences.fa file is expected to have unix EOL, so if that's not the case you may have to preprocess it with dos2unix or mac2unix.

Also, this solution addresses exactly what your question was about: counting the "-" characters at the beginning and at the end of each sequence. If you need to do something, just explain everything in a single question rather than opening different ones. This site's spirit is to help users to find their own solutions, rather than to solve user's problems.

ADD REPLYlink modified 22 days ago • written 22 days ago by Jorge Amigo12k
Please log in to add an answer.
The thread is closed. No new answers may be added.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1045 users visited in the last hour