Question: calculate the length of a sequence after adding the length of previous sequences
0
3.8 years ago by
User 677720
United States
User 677720 wrote:

Hi all,

I want to determine length of individual sequences in a multifasta file. I got this biopython code from the bio manual as:

``````from Bio import SeqIO
import sys
cmdargs = str(sys.argv)
for seq_record in SeqIO.parse(str(sys.argv[1]), "fasta"):
output_line = '%s\t%i' % \
seq_record.id, len(seq_record))
print(output_line)
``````

My input file is like:

``````>Protein1
MNT
>Protein2
TSMN
>Protein3
TTQRT
``````

And the code yields:

``````Protein1        3
Protein2        4
Protein3        5
``````

But I want to calculate the length of a sequence after adding the length of previous sequences. It would be like:

``````Protein1        1-3
Protein2        4-7
Protein3        8-12
``````

I don't know in which of the above line in the code I need to change to get that output. I'd appreciate any help on this issue, thanks!!!!

sequence python fasta • 699 views
modified 3.8 years ago by WouterDeCoster43k • written 3.8 years ago by User 677720
0
3.8 years ago by
Belgium
WouterDeCoster43k wrote:

You probably want to store the length and sequentially add each iteration in the loop the length.

I don't get the 1,4 and 8. Where do these numbers come from?

In addition, you don't really need `cmdargs = str(sys.argv)` (you also don't use it downstream I see)

I would rewrite and simplify your code to:

``````savedlength = 0 #Initiate the variable we are going to use to incrementally store the length
for seq_record in SeqIO.parse(str(sys.argv[1]), "fasta"):
savedlength += len(seq_record)
print("{}\t{}".formatseq_record.id, savedlength))
``````

Is this getting closer to what you need?

I think he is adding the previous length 3+1 7+1 but this is also not clear to me where is this 1 came from