1
2
Entering edit mode
5.2 years ago
karthic ▴ 120

Hi All,

After installing kraken am trying to build the database as specified in the manaul but getting the following messages. Any inputs on this??

/Tools/kraken-master/KRAKEN$./kraken-build --standard --threads 40 --db /home/karthic/Databases/KRAKEN Found jellyfish v1.1.11 Step 1/3: performing rsync dry run... Rsync dry run complete, removing any non-existent files from manifest. Step 2/3: Performing rsync file transfer of requested files rsync: failed to connect to ftp.ncbi.nlm.nih.gov (165.112.9.229): Connection timed out (110) rsync: failed to connect to ftp.ncbi.nlm.nih.gov (2607:f220:41e:250::7): Network is unreachable (101) rsync error: error in socket IO (code 10) at clientserver.c(128) [Receiver=3.1.1] rsync_from_ncbi.pl: rsync error, exited with code 10  Thanks in Advance, KK RNA-Seq genome next-gen software error Assembly • 6.1k views ADD COMMENT 0 Entering edit mode You are probably behind a firewall/proxy and kraken is not able to reach NCBI via rsync. If that is the case you may want to talk with your local sys admins. There are solutions but they will depend on your local setup. ADD REPLY 0 Entering edit mode Are you able to download anything from the NCBI ftp server using wget? ADD REPLY 0 Entering edit mode yes i could do with wget ADD REPLY 0 Entering edit mode Hi, I was hitting the same rsync error. The way I got around it was to change the rsync_from_ncbi.pl script to use wget instead. I changed line 70 from: if (system("rsync --no-motd --files-from=manifest.txt rsync://ftp.ncbi.nlm.nih.gov/genomes/ .") != 0) { to if (system("wget -nc -nH -x --cut-dirs=1 -i manifest.txt -B ftp://ftp.ncbi.nlm.nih.gov/genomes/ .") != 0) { It worked okay once I managed to get wget to behave in the the same way as the rsync command. I don't know how it will affect database updates. I was creating a new one when I ran into this error. Good Luck! ADD REPLY 0 Entering edit mode Worked for me, thanks! ADD REPLY 0 Entering edit mode Thanks, it works for me too. However, if the download was suspended， it will download the existing files wholly, it cannot resume from break point. the "-nc" flag didn't work ? ADD REPLY 2 Entering edit mode 5.2 years ago Joseph Hughes ★ 3.0k Since NCBI updated their FTP website and decided to phase-out Genbank Identifiers (GIs), the default Kraken database update scripts do not work. My colleague @Sej Modha has written a python script that helps with updating the kraken databases: http://bioinformatics.cvr.ac.uk/blog/update-kraken-databases/ ADD COMMENT 0 Entering edit mode Good to know. Has this been raised as an issue with kraken developers? ADD REPLY 0 Entering edit mode I believe Derrick Wood, kraken developer, has moved on to pastures new. ADD REPLY 0 Entering edit mode Hi Joseph, I tried the script but it is not working. Getting the following error.. /Tools/kraken-master$ python Update_kraken_db.py File "Update_kraken_db.py", line 18 if len(sys.argv) > 1: ^

2
Entering edit mode

Hi Karthic,

There is something wrong with the code formatting on the WordPress, code formatting plugin has changed the code on line 18.

Please download the script from the github and try again, let me know if there are any problems.

0
Entering edit mode

Hey Sej,

Thank you for the solution. The script is working.

Regards, KK

0
Entering edit mode

Hello Sed Modha, I have been using your script but at some point the following error appears:

sys:1: DtypeWarning: Columns (20) have mixed types. Specify dtype option on import or set low_memory=False.
Traceback (most recent call last):
File "./UpdateKrakenDatabases.py", line 118, in <module>
get_fasta_in_kraken_format('human_genome.fa')
File "./UpdateKrakenDatabases.py", line 98, in get_fasta_in_kraken_format
for seq_record in records:
File "/aplic/GOOLF/1.6.10/Python/3.3.2/lib/python3.3/site-packages/Bio/SeqIO/__init__.py", line 600, in parse
for r in i:
File "/aplic/GOOLF/1.6.10/Python/3.3.2/lib/python3.3/site-packages/Bio/GenBank/Scanner.py", line 478, in parse_records
record = self.parse(handle, do_features)
File "/aplic/GOOLF/1.6.10/Python/3.3.2/lib/python3.3/site-packages/Bio/GenBank/Scanner.py", line 462, in parse
if self.feed(handle, consumer, do_features):
File "/aplic/GOOLF/1.6.10/Python/3.3.2/lib/python3.3/site-packages/Bio/GenBank/Scanner.py", line 430, in feed
File "/aplic/GOOLF/1.6.10/Python/3.3.2/lib/python3.3/site-packages/Bio/GenBank/Scanner.py", line 1436, in _feed_header_lines
structured_comment_key = re.search(r"([^#]+){0}\$".format(STRUCTURED_COMMENT_START), data).group(1)
AttributeError: 'NoneType' object has no attribute 'group'


Any help?

1
Entering edit mode

Hi there, I have updated the script to explicitly specify the dtype, updated version of the script is available to download from the github.

0
Entering edit mode

Thank you for the help!