Get some errors in a biopython code for split a multifasta genomic file
1
0
Entering edit mode
4.5 years ago
schlogl ▴ 160

I download from NCBI a multifasta file with 40 viral genomes.
I read the file and tried to separate the files in single genomes like this:

for rec in SeqIO.parse(file, 'fasta'):
    ids = rec.id.split('|')[3]
    seqs = rec.seq
   #check for ids and seqs with len() and print(seqs[:500])
    outputfile = open('genome_', + ids + '.fasta')
    outputfile.write('>' + ids + '\n')
    outputfile.write(seqs)
outputfile.close()

As I said in the comments I printed it out the lengths of the ids and sequences and seems working just ok. But when I checked the files in my dir, some of them (some big genomes) got 0 sequence lengths. However, others are alright.

Some of you guys have any idea why this is happening?

Thanks for your time .

PS- I stated in bold that I got many good files and then I assume that the code is right, however, I just asking why some files doesn't work! The code is easy, I don't receive any error message, but some files got empty.

I just asking if someone here got something like that and what was done to fix it.

genome biopython • 1.0k views
ADD COMMENT
2
Entering edit mode

Try putting the outputfile.close() in the loop, not only once at the end of it.

ADD REPLY
1
Entering edit mode

You're using SeqIO wrong. It should be SeqIO.parse.

I think your output file is also wrong since ids will be a list, not a string.

ADD REPLY
0
Entering edit mode

Hey Joe no I just forgot to put '.parse' in the code. I will edit may post.

But If it is a list I would got a error message or something, but I got most of the file alright, but some of them got no seq et all.

Thanks

ADD REPLY
1
Entering edit mode

Are you sure rec.ids Is valid? Normally it is just id.

It's not obvious to me why you'd be getting 0 length sequences from that code, so there's something else going on I think.

ADD REPLY
0
Entering edit mode

It is a typo here. Because many sequences worked out. ;)

ADD REPLY
2
Entering edit mode

It's advisable to copy and paste your code exactly, rather than re-typing it, else we aren't truly seeing what you are seeing. In python, where syntax and white space is strongly enforced, this is even more the case.

ADD REPLY
1
Entering edit mode
4.5 years ago
schlogl ▴ 160

Got everything done with:

file = 'myfile.fasta'

with open(file, "r") as hd:
    for record in SeqIO.parse(hd, "fasta"):
        with openstrrecord.id[:-2]) + ".fasta", "w") as output_handle:
            SeqIO.write(record, output_handle, "fasta")

Thank you guys for your time and kindness!

Paulo 8)

ADD COMMENT

Login before adding your answer.

Traffic: 1949 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6