Need to extract sequences from genome fasta (SeqIO object) into output file using for loop. How do I return to the start of the genome with every iteration?
1
0
Entering edit mode
6.8 years ago
rmartson • 0

So I need to use a for loop to search through the genome for a specific chromosome, then extract sequences from that chromosome. however that only allows one run. I usually load fastas into memory using list(), but these things are around 3GB each so I can't do that. There's also nothing like seek() for a SeqIO object, so how do I return to the start of the genome for each iteration? Or alternatively, is there a much more efficient way of doing this?

Here are the relevant parts of my code (there are some bits cut out in between):

#sid is defined as the id of the chromosome a hit is found in in a "tabular with comments" BLAST output file
#start and end are the s.start and s.end of the hit within the chromosome
#genomehandle1 and organism1 are defined by input at the start of the script

genome1 = SeqIO.parse(str(genomehandle1)+".fa", "fasta")
flanks1 = open("flanks"+str(organism1)+".fa", "w")
#[...]
for chromosome in genome1:
    if sid == chromosome.id:
        print >> flanks1, ">"+str(qid)
        print >> flanks1, str(chromosome.seq[start-100:start])+str(chromosome.seq[end:end+100])
        break
#the for loop is reinitiated by another for loop above it

The "break" should just prevent the for loop I've pasted here from running beyond the relevant chromosome, right?

Currently I am getting output into flanks1, but only a single fasta sequence at the expected length when I'm expecting hundreds so I think it's just not finding any more sid == chromosome.id instances because it initiates the search from the end of the genome fasta.

Biopython blast • 2.2k views
ADD COMMENT
0
Entering edit mode
6.8 years ago
James Ashmore ★ 3.4k

Make yourself a BED file containing the regions you want to extract and then try the bedtools getfasta command:

bedtools getfasta [OPTIONS] -fi <FASTA> -bed <BED>
ADD COMMENT
0
Entering edit mode

eh I just loaded them to memory and it was fine

thanks though

ADD REPLY

Login before adding your answer.

Traffic: 1973 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6