Question: kraken: unable to download the databases from ncbi
2
gravatar for karthic
11 months ago by
karthic100
karthic100 wrote:

Hi All,

After installing kraken am trying to build the database as specified in the manaul but getting the following messages. Any inputs on this??

/Tools/kraken-master/KRAKEN$ ./kraken-build --standard --threads 40 --db /home/karthic/Databases/KRAKEN
Found jellyfish v1.1.11
Step 1/3: performing rsync dry run...
Rsync dry run complete, removing any non-existent files from manifest.
Step 2/3: Performing rsync file transfer of requested files
rsync: failed to connect to ftp.ncbi.nlm.nih.gov (165.112.9.229): Connection timed out (110)
rsync: failed to connect to ftp.ncbi.nlm.nih.gov (2607:f220:41e:250::7): Network is unreachable (101)
rsync error: error in socket IO (code 10) at clientserver.c(128) [Receiver=3.1.1]
rsync_from_ncbi.pl: rsync error, exited with code 10

Thanks in Advance, KK

ADD COMMENTlink modified 11 months ago by Joseph Hughes2.6k • written 11 months ago by karthic100

You are probably behind a firewall/proxy and kraken is not able to reach NCBI via rsync. If that is the case you may want to talk with your local sys admins. There are solutions but they will depend on your local setup.

ADD REPLYlink modified 11 months ago • written 11 months ago by genomax60k

Are you able to download anything from the NCBI ftp server using wget?

ADD REPLYlink written 11 months ago by dylan.lawrence10

yes i could do with wget

ADD REPLYlink written 11 months ago by karthic100

Hi,

I was hitting the same rsync error. The way I got around it was to change the rsync_from_ncbi.pl script to use wget instead. I changed line 70 from:

if (system("rsync --no-motd --files-from=manifest.txt rsync://ftp.ncbi.nlm.nih.gov/genomes/ .") != 0) {

to

if (system("wget -nc -nH -x --cut-dirs=1 -i manifest.txt -B ftp://ftp.ncbi.nlm.nih.gov/genomes/ .") != 0) {

It worked okay once I managed to get wget to behave in the the same way as the rsync command. I don't know how it will affect database updates. I was creating a new one when I ran into this error. Good Luck!

ADD REPLYlink written 11 months ago by dereksarovich0

Worked for me, thanks!

ADD REPLYlink written 4 months ago by doron0

Thanks, it works for me too. However, if the download was suspended, it will download the existing files wholly, it cannot resume from break point. the "-nc" flag didn't work ?

ADD REPLYlink written 11 weeks ago by 16327469070
2
gravatar for Joseph Hughes
11 months ago by
Joseph Hughes2.6k
Scotland, UK
Joseph Hughes2.6k wrote:

Since NCBI updated their FTP website and decided to phase-out Genbank Identifiers (GIs), the default Kraken database update scripts do not work.

My colleague @Sej Modha has written a python script that helps with updating the kraken databases: http://bioinformatics.cvr.ac.uk/blog/update-kraken-databases/

ADD COMMENTlink modified 11 months ago • written 11 months ago by Joseph Hughes2.6k

Good to know. Has this been raised as an issue with kraken developers?

ADD REPLYlink written 11 months ago by genomax60k

I believe Derrick Wood, kraken developer, has moved on to pastures new.

ADD REPLYlink written 11 months ago by Joseph Hughes2.6k

Hi Joseph,

I tried the script but it is not working. Getting the following error..

/Tools/kraken-master$ python Update_kraken_db.py File "Update_kraken_db.py", line 18 if len(sys.argv) > 1: ^

ADD REPLYlink written 11 months ago by karthic100
2

Hi Karthic,

There is something wrong with the code formatting on the WordPress, code formatting plugin has changed the code on line 18.

Please download the script from the github and try again, let me know if there are any problems.

ADD REPLYlink modified 11 months ago • written 11 months ago by Sej Modha4.0k

Hey Sej,

Thank you for the solution. The script is working.

Regards, KK

ADD REPLYlink written 11 months ago by karthic100

Hello Sed Modha, I have been using your script but at some point the following error appears:

sys:1: DtypeWarning: Columns (20) have mixed types. Specify dtype option on import or set low_memory=False.
Traceback (most recent call last):
  File "./UpdateKrakenDatabases.py", line 118, in <module>
    get_fasta_in_kraken_format('human_genome.fa')
  File "./UpdateKrakenDatabases.py", line 98, in get_fasta_in_kraken_format
    for seq_record in records:
  File "/aplic/GOOLF/1.6.10/Python/3.3.2/lib/python3.3/site-packages/Bio/SeqIO/__init__.py", line 600, in parse
    for r in i:
  File "/aplic/GOOLF/1.6.10/Python/3.3.2/lib/python3.3/site-packages/Bio/GenBank/Scanner.py", line 478, in parse_records
    record = self.parse(handle, do_features)
  File "/aplic/GOOLF/1.6.10/Python/3.3.2/lib/python3.3/site-packages/Bio/GenBank/Scanner.py", line 462, in parse
    if self.feed(handle, consumer, do_features):
  File "/aplic/GOOLF/1.6.10/Python/3.3.2/lib/python3.3/site-packages/Bio/GenBank/Scanner.py", line 430, in feed
    self._feed_header_lines(consumer, self.parse_header())
  File "/aplic/GOOLF/1.6.10/Python/3.3.2/lib/python3.3/site-packages/Bio/GenBank/Scanner.py", line 1436, in _feed_header_lines
    structured_comment_key = re.search(r"([^#]+){0}$".format(STRUCTURED_COMMENT_START), data).group(1)
AttributeError: 'NoneType' object has no attribute 'group'

Any help?

ADD REPLYlink modified 10 months ago • written 10 months ago by guillepalou40
1

Hi there, I have updated the script to explicitly specify the dtype, updated version of the script is available to download from the github.

ADD REPLYlink written 10 months ago by Sej Modha4.0k

Thank you for the help!

ADD REPLYlink written 10 months ago by guillepalou40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1856 users visited in the last hour