Question: Downloading Data From Mg-Rast
2
gravatar for bioinfo
6.6 years ago by
bioinfo790
bioinfo790 wrote:

Does anyone familiar with downloading data from MG-RAST? I have more than 100 metagenome ids that need to be downloaded in an efficient way. I found this link at MG-RAST (http://api.metagenomics.anl.gov/1/api.html#download) but couldn't manage to download those 100 metagenomes using their ids (e.g. 4441908.3). I dont want to download one by one with individual ids as it will take ages..!!

parsing • 10k views
ADD COMMENTlink modified 2.2 years ago by Dattatray Mongad350 • written 6.6 years ago by bioinfo790
4
gravatar for 5heikki
6.5 years ago by
5heikki9.0k
Finland
5heikki9.0k wrote:

I did something like this a while back:

cat keywordTableSortedUniqueIds.txt
mgm4440036.3
mgm4440037.3
mgm4440038.3
mgm4440039.3
mgm4440040.3
mgm4440041.3
mgm4440055.3
mgm4440056.3
..

while read line
do
curl http://api.metagenomics.anl.gov/1/download/"$line"?file=425.1 > $line.gz
done 

The "file=XXX" part specifies what exactly you want to download from the given metagenome, e.g. 425.1 here specifies predicted rRNA.

ADD COMMENTlink written 6.5 years ago by 5heikki9.0k

that was very helpful. I have just gone through the MG-RAST manual but didn't get much info about the "download stages or file=xxx/stage=xxx". As you mentioned, file=425.1 for predicted rRNA, Do you know what file no. should I use for raw original submitted metagenome fasta sequences? I tried "file=100.2" but not sure if it is right..!!

ADD REPLYlink modified 6.5 years ago • written 6.5 years ago by bioinfo790

Hey, I'm not sure you can gain access to the raw data by the api, however, I think file=100.2 contains the reads/contigs that passed quality filtering. There's probably also a file that contains the reads/contigs that didn't pass QC, so you could combine those if you really wanted them. You could always ask at the mg-rast mailing list..

ADD REPLYlink modified 6.5 years ago • written 6.5 years ago by 5heikki9.0k

Thanks. Now I have decided to go for reads that passed QC filtering and dereplication stages..!!

ADD REPLYlink written 6.5 years ago by bioinfo790

Hi, Thanks for these details about how to download data from the MG-RAST api. Did you add your webkey to access data that is not public yet? Or were these public metagenomes? I tried adding my webkey:

curl -H "auth: XXX" http://api.metagenomics.anl.gov/1/download/"$line"?file=100.2

but I just get a summary of the file info (bp_count etc), and I'm unable to download the fasta file.

Thank you!
Katrine

ADD REPLYlink modified 8 months ago by RamRS30k • written 6.3 years ago by Katrine20

For downloading raw data (data uploaded by MGRAST user as input data), use following: file=050.1 Or you can check for yourself how is the download address constructed by inspecting the "Download" button element and the url to which it leads.

Example: In the download page "http://www.mg-rast.org/mgmain.html?mgpage=download&metagenome=mgm4549958.3/MG_RAST_sub/BLANES_2010_cDNA_SURFACE_0.8.3__ILLUMINA.fna" go to the "Processing step" -> "0. Upload" and inspect the download button element on the right. Here it is "http://api-ui.mg-rast.org/download/mgm4549958.3?file=050.1"

ADD REPLYlink written 7 months ago by al-ash140
File 050.2 - This is the unfiltered metagenome that was originally uploaded to MG-RAST
File 100.1 - preprocess.passed.fna
File 100.2 - preprocess.removed (low quality)
File 350.2 & 350.3 - These are the protein coding genes (amino acids and nucleotides)
File 440.1 - These are predicted rRNA sequences (I do not recommend using MG-RAST for sensitive rRNA annotation. It does not use the internal structure of the gene, which other programs appropriately use for classification)
File 550.1 - This file shows clustered sequences which are 90% identical, to reduce the number of sequences that need to be annotated. Many folks don’t even know that this happens within MG-RAST.
File 650.1 & 650.2 - These files are essentially the blat tabular output from comparing your sequence to the database.

see example: http://metagenomics.anl.gov/metagenomics.cgi?page=DownloadMetagenome&metagenome=4447943.3

http://api.metagenomics.anl.gov/1/download/mgm4447943.3

ref: http://adina-howe.readthedocs.io/en/latest/mgrast/index.html

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Zhilong Jia1.6k

Hello, what language is this besides curl? I have a windows computer and am using curl through the command prompt, and would like to do something similar to what you described. 

ADD REPLYlink written 5.5 years ago by kbrannen0
1

It's Bash. I presume you could get Bash working on Windows with Cygwin or something but I haven't used Windows since XP so I can't really say. Alternatively, it probably doesn't take much effort to make a simple while read line loop with something that works on Windows by default like Python (?) or Java (?). If you plan to do lots of bioinformatics in the future, I suggest you ditch Windows for Linux or OS X.

ADD REPLYlink modified 5.5 years ago • written 5.5 years ago by 5heikki9.0k

Update- I downloaded Git Bash for windows, and I think I am having success using bash and curl with the command you listed. I am not very experienced with the Bash language yet, so I haven't added in any echo tests to see if the script is working the way I think it is. I do agree with you that windows is a hassle, but many times there are work-arounds. I will continue down this line for the people who use Windows and can not afford to/do not want to switch operating systems.

ADD REPLYlink written 5.5 years ago by kbrannen0
0
gravatar for Dattatray Mongad
2.2 years ago by
National Centre for Cell Science, Pune
Dattatray Mongad350 wrote:

I have a question... I have downloaded project files using MG-RAST tools mg-download.py --project projectid

But, now I want metadata for particular project (which file corresponds to which sample type?). How to download metadata?

ADD COMMENTlink written 2.2 years ago by Dattatray Mongad350
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1575 users visited in the last hour