Need to extract sequences from genome fasta (SeqIO object) into output file using for loop. How do I return to the start of the genome with every iteration?
1
0
Entering edit mode
3.7 years ago
rmartson • 0

So I need to use a for loop to search through the genome for a specific chromosome, then extract sequences from that chromosome. however that only allows one run. I usually load fastas into memory using list(), but these things are around 3GB each so I can't do that. There's also nothing like seek() for a SeqIO object, so how do I return to the start of the genome for each iteration? Or alternatively, is there a much more efficient way of doing this?

Here are the relevant parts of my code (there are some bits cut out in between):

#sid is defined as the id of the chromosome a hit is found in in a "tabular with comments" BLAST output file
#start and end are the s.start and s.end of the hit within the chromosome
#genomehandle1 and organism1 are defined by input at the start of the script

genome1 = SeqIO.parse(str(genomehandle1)+".fa", "fasta")
flanks1 = open("flanks"+str(organism1)+".fa", "w")
#[...]
for chromosome in genome1:
if sid == chromosome.id:
print >> flanks1, ">"+str(qid)
print >> flanks1, str(chromosome.seq[start-100:start])+str(chromosome.seq[end:end+100])
break
#the for loop is reinitiated by another for loop above it


The "break" should just prevent the for loop I've pasted here from running beyond the relevant chromosome, right?

Currently I am getting output into flanks1, but only a single fasta sequence at the expected length when I'm expecting hundreds so I think it's just not finding any more sid == chromosome.id instances because it initiates the search from the end of the genome fasta.

Biopython blast • 1.5k views
0
Entering edit mode
3.7 years ago
James Ashmore ★ 3.1k

Make yourself a BED file containing the regions you want to extract and then try the bedtools getfasta command:

bedtools getfasta [OPTIONS] -fi <FASTA> -bed <BED>

0
Entering edit mode

eh I just loaded them to memory and it was fine

thanks though