How to download BBTools sketch databases using fetchRefSeq?
0
0
Entering edit mode
2.5 years ago
O.rka ▴ 710

Does anyone know how to download these databases properly for BBTools?

I got it to work back in 2019 but can't seem to get it to work with the newest version. Here's the error I've gotten with the fetch scripts:

Version: BBMap version 38.93

RefSeq

(bbmap_env) -bash-4.2$ /usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/bbmap_env/opt/bbmap-38.93-0/pipelines/fetch/fetchRefSeq.sh
java -ea -Xmx1g -Xms1g -cp /usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/bbmap_env/opt/bbmap-38.93-0/current/ tax.RenameGiToTaxid -Xmx1g in=stdin.fa.gz out=renamed.fa.gz pigz=16 unpigz zl=9 server ow maxbadheaders=5000 badheaders=badHeaders.txt bgzip
Executing tax.RenameGiToTaxid [-Xmx1g, in=stdin.fa.gz, out=renamed.fa.gz, pigz=16, unpigz, zl=9, server, ow, maxbadheaders=5000, badheaders=badHeaders.txt, bgzip]

Time:                           488.852 seconds.
Reads Processed:       39175    0.08k reads/sec
Bases Processed:       1273m    2.61m bases/sec

Valid Sequences:    39175
Valid Bases:        1273716988
Invalid Sequences:  0
Invalid Bases:      0
Exception in thread "main" java.lang.RuntimeException: tax.RenameGiToTaxid terminated in an error state; the output may be corrupt.
    at tax.RenameGiToTaxid.process(RenameGiToTaxid.java:307)
    at tax.RenameGiToTaxid.main(RenameGiToTaxid.java:39)
metagenomics bbmap sketch bbtools • 1.1k views
ADD COMMENT
0
Entering edit mode

My speculation is since NCBI has deprecated use of gi numbers this script no longer works.

ADD REPLY
0
Entering edit mode

Do you know if there are any plans in the future to update these scripts? A big fan of the BBTools suite and would prefer to include them in pipelines I'm actively working on and potentially will publish.

ADD REPLY
0
Entering edit mode

Do you remember what was the end result of this script when it ran in 2019? Is it just downloading RefSeq data? Perhaps I can suggest an alternate way.

ADD REPLY
0
Entering edit mode

The end result was a bunch of sketch files:

(base) -bash-4.2$ cd bbtools/refseq_2019-02-14/
(base) -bash-4.2$ ls
taxa0.sketch   taxa12.sketch  taxa15.sketch  taxa18.sketch  taxa20.sketch  taxa23.sketch  taxa26.sketch  taxa29.sketch  taxa3.sketch  taxa6.sketch  taxa9.sketch
taxa10.sketch  taxa13.sketch  taxa16.sketch  taxa19.sketch  taxa21.sketch  taxa24.sketch  taxa27.sketch  taxa2.sketch   taxa4.sketch  taxa7.sketch
taxa11.sketch  taxa14.sketch  taxa17.sketch  taxa1.sketch   taxa22.sketch  taxa25.sketch  taxa28.sketch  taxa30.sketch  taxa5.sketch  taxa8.sketch

Each one looks like this:

(base) -bash-4.2$ head -n 3 taxa0.sketch && echo " " && tail -n 3 taxa0.sketch
#SZ:186 CD:ADC  K:32,24 H:2 GS:1438 GK:1376 GE:1017 GQ:2    ID:1036773  NM:Leucosphaerina arxii NM0:tid|1036773|NR_145040.1 Leucosphaerina arxii CBS 737.84 ITS region; from TYPE material
1bGG;ZE`em
7^oOg_Ke7

1OjMd:ZN
GNEj@mj
33mT:RYS
ADD REPLY

Login before adding your answer.

Traffic: 2048 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6