Question: calculate the length of a sequence after adding the length of previous sequences
0
gravatar for User 6777
2.8 years ago by
User 677710
United States
User 677710 wrote:

Hi all,

I want to determine length of individual sequences in a multifasta file. I got this biopython code from the bio manual as:

from Bio import SeqIO
import sys
cmdargs = str(sys.argv)
for seq_record in SeqIO.parse(str(sys.argv[1]), "fasta"):
 output_line = '%s\t%i' % \
seq_record.id, len(seq_record))
 print(output_line)

My input file is like:

>Protein1
MNT
>Protein2
TSMN
>Protein3
TTQRT

And the code yields:

Protein1        3
Protein2        4
Protein3        5

But I want to calculate the length of a sequence after adding the length of previous sequences. It would be like:

Protein1        1-3
Protein2        4-7
Protein3        8-12

I don't know in which of the above line in the code I need to change to get that output. I'd appreciate any help on this issue, thanks!!!!

sequence python fasta • 577 views
ADD COMMENTlink modified 2.8 years ago by WouterDeCoster39k • written 2.8 years ago by User 677710
0
gravatar for WouterDeCoster
2.8 years ago by
Belgium
WouterDeCoster39k wrote:

You probably want to store the length and sequentially add each iteration in the loop the length.

I don't get the 1,4 and 8. Where do these numbers come from?

In addition, you don't really need cmdargs = str(sys.argv) (you also don't use it downstream I see)

I would rewrite and simplify your code to:

savedlength = 0 #Initiate the variable we are going to use to incrementally store the length
for seq_record in SeqIO.parse(str(sys.argv[1]), "fasta"):
    savedlength += len(seq_record)    
    print("{}\t{}".formatseq_record.id, savedlength))

Is this getting closer to what you need?

ADD COMMENTlink written 2.8 years ago by WouterDeCoster39k

I think he is adding the previous length 3+1 7+1 but this is also not clear to me where is this 1 came from

ADD REPLYlink written 2.8 years ago by Medhat8.2k

It could also be that he is creating a begin and end 'position' for each sequence, but since it's unclear I prefer to ask rather than assuming something :p

ADD REPLYlink written 2.8 years ago by WouterDeCoster39k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1916 users visited in the last hour