Question

Downloading full BlastKOALA results?

0

Entering edit mode

5.6 years ago

EduardoFox ▴ 10

I have just started using BlastKOALA KEGG which has been useful in annotating (aminoacid) sequences. This is their website: https://www.kegg.jp/blastkoala/

When you get results, there are links for downloading. However these links will not download all detailed query search results, but just general notes already on the screen. To get these results I need to manually click on each query result on the page, which becomes impracticable with >500 entries. Thus I think need is a tool to download all linked contents from a webpage. I have been trying 'wget' however it doesn't work. It says 'Requested Job Not Found' whatever I do.

Please, did anyone every try to achieve this? Thanks in advance.

blast blastkoala wget annotation KO • 2.1k views

ADD COMMENT • link updated 5.6 years ago by lelle ▴ 830 • written 5.6 years ago by EduardoFox ▴ 10

score 1 · Answer 1 · 2018-09-24

1

Entering edit mode

5.6 years ago

lelle ▴ 830

I had quick look at this on my blastKOALA Result.

When I click on one of my queries I get a detailed list of matches. The list has an URL like this:

https://www.kegg.jp/kegg-bin/blastkoala_result_gene_list?id=39732d974cf46cbc344f96d5d7e81bb69c18dcea&passwd=x3XXyz&type=blastkoala&code=user&target=g1%2Et1

If I run

wget "https://www.kegg.jp/kegg-bin/blastkoala_result_gene_list?id=39732d974cf46cbc344f96d5d7e81bb69c18dcea&passwd=x3XXyz&type=blastkoala&code=user&target=g1%2Et1" -O g1.t1_hits.html

I get a file called g1.t1_hits.html (because of the -O option).

If I change the last parameter of the URL (target=g1%2Et1) to a different protein name I get the result of the according protein.

Maybe you are missing the quotation marks in your wget command?

ADD COMMENT • link 5.6 years ago by lelle ▴ 830

0

Entering edit mode

Thanks for testing the download ! However you will see that the downloaded page is just what already shows in the screen, which I could easily get by selecting all and pasting to a text editor. I'd like to download the detailed results for each queried protein which you can only see by directly clicking on it. In other words, I'd like to download all HTML pages linked to the page you just downloaded. Please, would you know how to set this in wget? I cannot get all links. Thanks!

ADD REPLY • link 5.6 years ago by EduardoFox ▴ 10

1

Entering edit mode

the way I would do this is by writing a bash script that calls wget with each protein ID. Something like this:

while read PROT; do
  echo "$PROT"
  wget "https://www.kegg.jp/kegg-bin/blastkoala_result_gene_list?id=39732d974cf46cbc344f96d5d7e81bb69c18dcea&passwd=XXxxXX&type=blastkoala&code=user&target=${PROT}" -O ${PROT}_koala.html
done

Where prot.txt is a file with one protein ID per line

ADD REPLY • link 5.6 years ago by lelle ▴ 830