Question: NCBI FASTA files are a bit crazy
0
gravatar for flame233
4.0 years ago by
flame23310
flame23310 wrote:

Hello everyone. I am new to bioinformatics, although i have been around these forums for a while.

As for my question, I am working with protein sequences from completed genomes of a bacteria, but I have noticed that something weird (or interesting) happens to the downloaded proteomes and genomes that I dowload from the NCBI site.

After the downloads are completed, for [Ref_Seq] sequences, two things happen: either an error message appear at the end of the sequence as “Resource temporarily unavailable (4).”, or every file I download has a different amount of lines, gene amounts and data size.

Is this normal?. Maybe is not statistically significant in the analysis? Is there any way to improve this?

Thank you. And apologies for my grammar

fasta proteome tool genome ncbi • 1.2k views
ADD COMMENTlink modified 4.0 years ago by genomax68k • written 4.0 years ago by flame23310

How are you downloading this data and from what location?

ADD REPLYlink written 4.0 years ago by genomax68k

From: http://www.ncbi.nlm.nih.gov/nuccore/NC_017534.1
Send tab -> Coding sequences -> FASTA Protein

ADD REPLYlink written 4.0 years ago by flame23310

Probably it's because of your internet connection or something like that: it either cannot connect or breaks in the middle (therefore different number of lines). If you are downloading something important use ftp, don't rely on their website tool.

ADD REPLYlink written 4.0 years ago by marina.v.yurieva480

Yes, FTP worked perfectly, thanks everyone for your help

ADD REPLYlink written 4.0 years ago by flame23310
1
gravatar for genomax
4.0 years ago by
genomax68k
United States
genomax68k wrote:

I suggest that you download the *.fna and *.faa (DNA and Protein) files directly from respective genome folders in: ftp://ftp.ncbi.nih.gov/genomes/Bacteria/

That said I have not had any problems downloading sequences using the method you mentioned above.

ADD COMMENTlink written 4.0 years ago by genomax68k

Thanks, i will give it a try and post feedback :)

 

Edit: It worked perfectly, i can acces the whole sequence with no trouble. Thanks a lot :)

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by flame23310
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1086 users visited in the last hour