Entering edit mode
12.0 years ago
Abdullah
▴
100
Hi,
I have a set of sequences in a fasta file, i read them into biopython and then i want to make some changes to them (adding or replacing or deleting) and then i want to save them to a new fastafile.
what i'm doing in the example is a small shifting of 5 nucleotides :
> allrecords=list(SeqIO.parse("test.fasta", 'fasta'))
> fullseq=0
> tochangeseq =0
> oldfullseq=0
>
> for y in range(len(allrecords)):
>
>
> oldfullseq=allrecords[y].seq
> tochange=5 #might be dynamically included by another for loop
> tochangeseq = allrecords[y].seq[0:tochange]
> fullseq = oldfullseq[tochange:len(oldfullseq)]
> fullseq += tochangeseq
> allrecords[y].seq = fullseq
> print allrecords[0].seq ### it should print the new updated sequence, but it's not
but the problem is when i print the sequences after the loop, i found that they did not changed, how can i update them and then write the new version to a fasta file.
help would be appreciated
The code for the sequence manipulation is a bit convoluted. You can just do this:
newSeq = allrecords[y].seq[tochange:] + allrecords[y].seq[:tochange]
allrecords[y].seq = newSeq
But I don't see anything wrong with it. It looks like you deleted a lot of the code. Can you post the full code?
Hi, when i try your method, i get Memory error .. besides .. that's the whole code .. i just want to update the sequences and output the new ones to a new fasta file .. but it's not working
Just do this then:
pipe the results to a new file.
i think this not a good idea to do this .. i want to fill the new sequences inside an array because i might need them for other computational stuff ...the question is why my method is not working, if i print full seq it prints the new sequence but it's not being assigned to the record object ..
Honestly I don't really know why it doesn't work for you. That's why I thought maybe you didn't post the full code. I tried this:
On this test.fa:
And it worked for me. Giving me this:
It's not a good idea to work on large sequences (which I assume you are doing) and store them all in-memory. This is why Damian's previously suggested method resulted in a memory error for you.
On another note, I also tried your posted code with a smaller mock fasta file, it works just fine. It prints the new sequences, and when I inspected the SeqRecord objects, their sequences have changed as well. Have you posted the entire code?
Hi. I don't see the problem here. Your code works for me...
Hi, thank you all for your help, i found the error .. it's working now .. and your codes also works ..
You forgot to open the 'test.fasta' file:) Try: allrecords = list(SeqIO.parse(open("test.fasta"), "fasta"))
No, in modern Biopython, SeqIO.parse can take a filename as a string or a file handle
good to know. thanks