Question

Trying To Get Around Memoryerror: Out Of Memory Exception In Biopython Program

0

Entering edit mode

13.1 years ago

hpodschwit • 0

Hello all,

First, many of you will have to forgive my minimal programming skills but I am trying to write a program that will take 2 paired very large multiple-sequence FASTA file (each have >77,000 sequences) and get an alignment score of the sequences. My current implementation of the problem has had the problem that it uses to much memory. To get around this I split the multiple-sequence fasta files into a bunch of smaller files to feed in one-by-one. Still after getting around the 38th score I get a MemoryError: Out of memory exception. I will attach my code but I was wondering if anyone had any suggestions of fixing this issue or simply getting the alignment scores for the list. Please help I hope I am being clear and again forgive and verbose/ bad code,

I tried to copy and paste my code but it ended up being really messy so I have simply attached a link.

Anyways any help is greatly appreciated and I hope you are all having a great Tuesday and PLEASE PLEASE PLEASE HELP IF YOU ARE ABLE TO!!

biopython alignment scoring • 3.9k views

ADD COMMENT • link 13.1 years ago by hpodschwit • 0

0

Entering edit mode

Perhaps you can use pseudocode to describe what you are doing. Are you doing pairwise alignments, for example?

ADD REPLY • link 13.1 years ago by Sean Davis 27k

0

Entering edit mode

I can't see your link, so it is hard to guess what you are doing - unless you are trying to load 77000 sequences in RAM at the same time which would be most unwise. Have you considered existing tools for this kind of thing, e.g. EMBOSS needleall http://emboss.open-bio.org/wiki/Appdoc:Needleall (you can parse the EMBOSS alignment output with Biopython)

ADD REPLY • link 13.1 years ago by Peter 6.0k