Question: How can I count the symbols in a given string / text and, as a result of that count, remove the characters
0
gravatar for projetoic
9 weeks ago by
projetoic0
projetoic0 wrote:
f = open('Denv4-X-gb_AY947539.txt', 'r')
z = f.read()
count_inicio = sum(map(lambda x : 1 if '-' in x else 0, z)) 
count_fim = sum(map(lambda x : 1 if '-' in x else 0, reversed(z))) 
print(count_inicio, count_fim)
Output>
479 479

file contents:

lcl|NC_002640.1_cds_NP_073286.1_1 [gene=POLY] [locus_tag=DV4_gp1]
     [db_xref=GeneID:5075729] [protein=polyprotein]
     [protein_id=NP_073286.1] [location=102..10265] [gbkey=CDS]
     ------------------------------------------------------------ ---------------------------------atgaaccaacgaaaaaaggtggttaga ccacctttcaatatgctgaaacgcgagagaaaccgcgtatcaacccctcaagggttggtg
     aagagattctcaaccggacttttttctgggaaaggacccttacggatggtgctagcattc
     atcacgtttttgcgagtcctttccatcccaccaacagcagggattctgaagagatgggga
     cagttgaagaaaaataaggccatcaagatactgattggattcaggaaggagataggccgc
     ------------------------------------------------------------ 

gb:AY947539|Organism:Dengue virus 4|Strain
     Name:H241|Segment:null|Subtype:4|Host:Human
     ggtcgtgtggaccgacaaggacagttccaaatcggaagcttgcttaacacagttctaaca
     gtttgtttagatagagagcagatctctggaaaaatgaaccaacgaaaaaaggtggttaga
     ccacctttcaatatgctgaaacgcgagagaaaccgcgtatcaacccctcaagggttggtg
     aagagattctcaaccggacttttttccgggaaaggacccttacggatggtgctagcattc
     atcacgtttttgcgagtcctttccatcccaccaacagcagggattctgaaaagatgggga
     cagttgaagaaaaacaaggccatcaaaatactgactggattcaggaaggagataggccgc
     atgctgaacatcttgaatggaagaaaaaggtcaacaatgacattgctgtgcttgattccc

For example I need to take the sequence lcl | NC_002640.1_cds_NP_073286.1_1> --- AATG-GG ---- and count the number of "-" at the beginning and end

And then cut into Myseq1 gb: AY947539 | Organism: Dengue virus 4 | GGGAATG-GGAAAA characters according to the amount of "-"

TALE 3 "-" in Myseq start and 3 at the end 4 ... So the output I want is AATF-GG. But first I need to make this "-" count from the beginning and the end.

How do I count symbols in a given string / text and as a result of that count remove characters from another string / text in the same file?

ADD COMMENTlink written 9 weeks ago by projetoic0

1) first understand your format, looks like some multiple alignment format, so you can check if BioPython has a module to read it

2) if not, you need to read your sequences, you have a header of 3 lines in the first sequence (is it not a single line? that facilitates reading it), and a header of 2 lines in sequence 2, then each block has the nucleotide sequence, so add the sequence 1 in a string and iterate over it to get "-"

3) load the sequence 2 in another string and remove the blocks (use string ranges)

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by JC12k

The header is on the same line in my FASTA file. And I turned it into txt because I couldn't read it with biopython. The alignment was done with mafft

arquivo.fasta.aln ou arquivo.aln turned into txt

Could you give an example

Input:

lcl | NC_002640.1_cds_NP_073286.1_1>
 --- AATG-GG ----
gb: AY947539 | Organism: Dengue virus 4 |
GGGAATG-GGAAAA

output:

 gb: AY947539 | Organism: Dengue virus 4 |
 AATF-GG

I need to count the number of "-" of the first string and cut the characters of the second string according to that amount find a second string ... To be only with the CDs in the second string

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by projetoic0
1

Por favor cara, precisamos um Input e Output esperado;

Per exemplo:

Input:

> header
------AAAA---BBBBB-----
-----------ATGCATGC---
---ATGCATGCCCCC

> GB proteinA proteinB
aactgtgactgcatgcatgactgactg
tacactactgcatgcatgactgactgc

Desired output:

> GB proteinA ----- proteinB
aactgtgactgcatgcatgactgactg
tacactactgcatgcatgactgactgc
ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by Kevin Blighe68k

ok you can check now

ADD REPLYlink written 9 weeks ago by projetoic0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1535 users visited in the last hour