Question: Get some errors in a biopython code for split a multifasta genomic file
0
gravatar for schlogl
16 months ago by
schlogl70
Brazil-Florianopolis
schlogl70 wrote:

I download from NCBI a multifasta file with 40 viral genomes.
I read the file and tried to separate the files in single genomes like this:

for rec in SeqIO.parse(file, 'fasta'):
    ids = rec.id.split('|')[3]
    seqs = rec.seq
   #check for ids and seqs with len() and print(seqs[:500])
    outputfile = open('genome_', + ids + '.fasta')
    outputfile.write('>' + ids + '\n')
    outputfile.write(seqs)
outputfile.close()

As I said in the comments I printed it out the lengths of the ids and sequences and seems working just ok. But when I checked the files in my dir, some of them (some big genomes) got 0 sequence lengths. However, others are alright.

Some of you guys have any idea why this is happening?

Thanks for your time .

PS- I stated in bold that I got many good files and then I assume that the code is right, however, I just asking why some files doesn't work! The code is easy, I don't receive any error message, but some files got empty.

I just asking if someone here got something like that and what was done to fix it.

biopython genome • 368 views
ADD COMMENTlink modified 16 months ago • written 16 months ago by schlogl70
2

Try putting the outputfile.close() in the loop, not only once at the end of it.

ADD REPLYlink written 16 months ago by WouterDeCoster45k
1

You're using SeqIO wrong. It should be SeqIO.parse.

I think your output file is also wrong since ids will be a list, not a string.

ADD REPLYlink written 16 months ago by Joe19k

Hey Joe no I just forgot to put '.parse' in the code. I will edit may post.

But If it is a list I would got a error message or something, but I got most of the file alright, but some of them got no seq et all.

Thanks

ADD REPLYlink written 16 months ago by schlogl70
1

Are you sure rec.ids Is valid? Normally it is just id.

It's not obvious to me why you'd be getting 0 length sequences from that code, so there's something else going on I think.

ADD REPLYlink written 16 months ago by Joe19k

It is a typo here. Because many sequences worked out. ;)

ADD REPLYlink written 16 months ago by psschlogl30
2

It's advisable to copy and paste your code exactly, rather than re-typing it, else we aren't truly seeing what you are seeing. In python, where syntax and white space is strongly enforced, this is even more the case.

ADD REPLYlink written 16 months ago by Joe19k
1
gravatar for schlogl
16 months ago by
schlogl70
Brazil-Florianopolis
schlogl70 wrote:

Got everything done with:

file = 'myfile.fasta'

with open(file, "r") as hd:
    for record in SeqIO.parse(hd, "fasta"):
        with openstrrecord.id[:-2]) + ".fasta", "w") as output_handle:
            SeqIO.write(record, output_handle, "fasta")

Thank you guys for your time and kindness!

Paulo 8)

ADD COMMENTlink written 16 months ago by schlogl70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2132 users visited in the last hour
_