Download rRNAs.fasta for all bacteria from database
0
0
Entering edit mode
9 months ago
kamanovae ▴ 100

Hi!

I want to download rRNA sequences for all bacteria from a database http://ftp.ebi.ac.uk/pub/databases/metagenomics/mgnify_genomes/human-gut/v2.0.1/. I know that I can recursively make it using get, but I don't understand how exactly to do it. I need to go into a folder, recursively go through the subfolders.

enter image description here

Inside folder1 there are many subfolders, which contain two folders (one of which is "genome"). There are many files in the folder "genome", but I only need FileName.rRNAs.fasta. Sometimes it may not be there.

An example download path might look like this:

species_catalogue/MGYG0000000/MGYG000000001/genome/MGYG000000001_rRNAs.fasta

Maybe you know what wget command I need?

I would be grateful for any help!

wget NCBI • 620 views
ADD COMMENT
1
Entering edit mode

That web/ftp site uses a robots file that prevents crawling of the site. As a result you would not be able to use wget to crawl. One is supposed to respect this setting.

That said the folders have a specific URL structure that should allow for creation of direct URL's and a loop to download the fasta files you are looking for. I have confirmed that it works. In any case you should be a good citizen and put in appropriate pauses if you choose to download with this kind of method.

ADD REPLY
1
Entering edit mode

Use the RESTAPI via the python toolkit to download the files: https://pypi.org/project/mg-toolkit/

ADD REPLY

Login before adding your answer.

Traffic: 1797 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6