Question: Need to extract sequences from genome fasta (SeqIO object) into output file using for loop. How do I return to the start of the genome with every iteration?
0
gravatar for rmartson
12 months ago by
rmartson0
rmartson0 wrote:

So I need to use a for loop to search through the genome for a specific chromosome, then extract sequences from that chromosome. however that only allows one run. I usually load fastas into memory using list(), but these things are around 3GB each so I can't do that. There's also nothing like seek() for a SeqIO object, so how do I return to the start of the genome for each iteration? Or alternatively, is there a much more efficient way of doing this?

Here are the relevant parts of my code (there are some bits cut out in between):

#sid is defined as the id of the chromosome a hit is found in in a "tabular with comments" BLAST output file
#start and end are the s.start and s.end of the hit within the chromosome
#genomehandle1 and organism1 are defined by input at the start of the script

genome1 = SeqIO.parse(str(genomehandle1)+".fa", "fasta")
flanks1 = open("flanks"+str(organism1)+".fa", "w")
#[...]
for chromosome in genome1:
    if sid == chromosome.id:
        print >> flanks1, ">"+str(qid)
        print >> flanks1, str(chromosome.seq[start-100:start])+str(chromosome.seq[end:end+100])
        break
#the for loop is reinitiated by another for loop above it

The "break" should just prevent the for loop I've pasted here from running beyond the relevant chromosome, right?

Currently I am getting output into flanks1, but only a single fasta sequence at the expected length when I'm expecting hundreds so I think it's just not finding any more sid == chromosome.id instances because it initiates the search from the end of the genome fasta.

blast biopython • 405 views
ADD COMMENTlink modified 12 months ago by James Ashmore2.5k • written 12 months ago by rmartson0
0
gravatar for James Ashmore
12 months ago by
James Ashmore2.5k
UK/Edinburgh/MRC Centre for Regenerative Medicine
James Ashmore2.5k wrote:

Make yourself a BED file containing the regions you want to extract and then try the bedtools getfasta command:

bedtools getfasta [OPTIONS] -fi <FASTA> -bed <BED>
ADD COMMENTlink modified 12 months ago • written 12 months ago by James Ashmore2.5k

eh I just loaded them to memory and it was fine

thanks though

ADD REPLYlink written 12 months ago by rmartson0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1405 users visited in the last hour