Question: Blast an organism against a downloaded database
gravatar for caro-ca
10 days ago by
caro-ca0 wrote:

Hi, Biostar community!

I am trying to do a blastn with my genome assembled on Linux against an organism that only has one public assembled genome. The latter genome comes from the same organism but a different strand. When I ran Mummer, my genome assembly and the one that is on NCBI are quite different. My main goal is to find contaminants in my genome assembly (if there are). The only database that fits me is other_genomic.gz (, but when I try to gunzip it, I have got this:

gzip: other_genomic.gz: unexpected end of file

What does this mean? I should be able to decompress other_genomic.gz. I hope you could help me out. Thank you in advance!

nanopore databases blast ncbi • 142 views
ADD COMMENTlink modified 10 days ago by Mensur Dlakic1.3k • written 10 days ago by caro-ca0

As an alternative go gzip, you should be able to use zcat:

zcat other_genomic.gz > other_genomic.fa
ADD REPLYlink modified 10 days ago • written 10 days ago by Jean-Karim Heriche20k

Thank you for the reply, but unfortunately, it didn't work.

ADD REPLYlink written 10 days ago by caro-ca0
gravatar for Mensur Dlakic
10 days ago by
Mensur Dlakic1.3k
Mensur Dlakic1.3k wrote:

It could mean several things: 1) your gunzip is too old (try gunzip -V; it should be 1.3 or higher); 2) if you downloaded the .gz file with a web browser, sometimes they unzip the file on the fly; 3) the file is really incomplete (like it says, unexpected end of file) or was downloaded in a wrong format.

Type file other_genomic.gz and Linux will tell you what type of file you have.Try opening it with a text editor or simply go with more other_genomic.gz. If option 2 is correct, it will look like plain FASTa file. If option 3 is correct, the contents will be garbled. You may need do download the file again using wget.

ADD COMMENTlink written 10 days ago by Mensur Dlakic1.3k

You can check that the downloaded file is not corrupted by checking the corresponding md5 checksum (run in the same directory as other_genomic.gz):

md5sum -c other_genomic.gz.md5
ADD REPLYlink written 10 days ago by Jean-Karim Heriche20k

Thank you so much for your response. I checked your comments: 1) gzip 1.6 2) I downloaded the .gz from a web browser. When I type file other_genomic.gz I get :

other_genomic.gz: gzip compressed data, last modified: Thu May 23 04:34:35 2019, from Unix

When I try to do more other_genomic.gz I get:

~/CH12_Contaminacion$ more other_genomic.gz 

When I try md5sum -c other_genomic.gz.md5:

md5sum: other_genomic.gz.md5: No such file or directory

As a general overview of the database is 1 Tera big. When I try to blast on Blast2go, it needs a fasta file. I downloaded the .gz file uncompress it and instead of having a .fasta file I get an executable on Linux.

ADD REPLYlink written 10 days ago by caro-ca0

Note that the file is on the order of 0.3 TB so at an optimal 100 Mb/s it would still take 7-8 h to download. I rarely had a connection stay open for this long. You can resume an interrupted download using the --continue option of wget.

ADD REPLYlink written 9 days ago by Jean-Karim Heriche20k

The md5 checksum file has to be downloaded in the same directory as the file with the same name and the md5sum command run in this directory.
What do you mean you get an executable?

ADD REPLYlink written 10 days ago by Jean-Karim Heriche20k

You have a proper gunzip and the type of your file is correct, so it seems that your download was incomplete - exactly as the error message indicated. Instead of downloading with your browser, copy the link from the right-click menu, and paste it after the wget command:

wget your_copied_URL

Sometimes the last few KBs in a large file take a while to write to disk, and that may have caused the incomplete download. As long as you wait for wget to finish, gunzip should work afterwards.

ADD REPLYlink modified 9 days ago • written 9 days ago by Mensur Dlakic1.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 858 users visited in the last hour