Question: Downloading full BlastKOALA results?
gravatar for EduardoFox
14 months ago by
EduardoFox10 wrote:

I have just started using BlastKOALA KEGG which has been useful in annotating (aminoacid) sequences. This is their website:

When you get results, there are links for downloading. However these links will not download all detailed query search results, but just general notes already on the screen. To get these results I need to manually click on each query result on the page, which becomes impracticable with >500 entries. Thus I think need is a tool to download all linked contents from a webpage. I have been trying 'wget' however it doesn't work. It says 'Requested Job Not Found' whatever I do.

Please, did anyone every try to achieve this? Thanks in advance.

ADD COMMENTlink modified 13 months ago by lelle800 • written 14 months ago by EduardoFox10
gravatar for lelle
13 months ago by
lelle800 wrote:

I had quick look at this on my blastKOALA Result.

When I click on one of my queries I get a detailed list of matches. The list has an URL like this:

If I run

wget "" -O g1.t1_hits.html

I get a file called g1.t1_hits.html (because of the -O option).

If I change the last parameter of the URL (target=g1%2Et1) to a different protein name I get the result of the according protein.

Maybe you are missing the quotation marks in your wget command?

ADD COMMENTlink written 13 months ago by lelle800

Thanks for testing the download ! However you will see that the downloaded page is just what already shows in the screen, which I could easily get by selecting all and pasting to a text editor. I'd like to download the detailed results for each queried protein which you can only see by directly clicking on it. In other words, I'd like to download all HTML pages linked to the page you just downloaded. Please, would you know how to set this in wget? I cannot get all links. Thanks!

ADD REPLYlink written 13 months ago by EduardoFox10

the way I would do this is by writing a bash script that calls wget with each protein ID. Something like this:

while read PROT; do
  echo "$PROT"
  wget "${PROT}" -O ${PROT}_koala.html

Where prot.txt is a file with one protein ID per line

ADD REPLYlink written 13 months ago by lelle800
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1225 users visited in the last hour