Downloading Data From Mg-Rast
2
2
Entering edit mode
10.1 years ago
bioinfo ▴ 830

Does anyone familiar with downloading data from MG-RAST? I have more than 100 metagenome ids that need to be downloaded in an efficient way. I found this link at MG-RAST (http://api.metagenomics.anl.gov/1/api.html#download) but couldn't manage to download those 100 metagenomes using their ids (e.g. 4441908.3). I dont want to download one by one with individual ids as it will take ages..!!

parsing • 16k views
ADD COMMENT
4
Entering edit mode
10.1 years ago
5heikki 11k

I did something like this a while back:

cat keywordTableSortedUniqueIds.txt
mgm4440036.3
mgm4440037.3
mgm4440038.3
mgm4440039.3
mgm4440040.3
mgm4440041.3
mgm4440055.3
mgm4440056.3
..

while read line
do
curl http://api.metagenomics.anl.gov/1/download/"$line"?file=425.1 > $line.gz
done 

The "file=XXX" part specifies what exactly you want to download from the given metagenome, e.g. 425.1 here specifies predicted rRNA.

ADD COMMENT
0
Entering edit mode

that was very helpful. I have just gone through the MG-RAST manual but didn't get much info about the "download stages or file=xxx/stage=xxx". As you mentioned, file=425.1 for predicted rRNA, Do you know what file no. should I use for raw original submitted metagenome fasta sequences? I tried "file=100.2" but not sure if it is right..!!

ADD REPLY
0
Entering edit mode

Hey, I'm not sure you can gain access to the raw data by the api, however, I think file=100.2 contains the reads/contigs that passed quality filtering. There's probably also a file that contains the reads/contigs that didn't pass QC, so you could combine those if you really wanted them. You could always ask at the mg-rast mailing list..

ADD REPLY
0
Entering edit mode

Thanks. Now I have decided to go for reads that passed QC filtering and dereplication stages..!!

ADD REPLY
0
Entering edit mode

Hi, Thanks for these details about how to download data from the MG-RAST api. Did you add your webkey to access data that is not public yet? Or were these public metagenomes? I tried adding my webkey:

curl -H "auth: XXX" http://api.metagenomics.anl.gov/1/download/"$line"?file=100.2

but I just get a summary of the file info (bp_count etc), and I'm unable to download the fasta file.

Thank you!
Katrine

ADD REPLY
0
Entering edit mode

For downloading raw data (data uploaded by MGRAST user as input data), use following: file=050.1 Or you can check for yourself how is the download address constructed by inspecting the "Download" button element and the url to which it leads.

Example: In the download page "http://www.mg-rast.org/mgmain.html?mgpage=download&metagenome=mgm4549958.3/MG_RAST_sub/BLANES_2010_cDNA_SURFACE_0.8.3__ILLUMINA.fna" go to the "Processing step" -> "0. Upload" and inspect the download button element on the right. Here it is "http://api-ui.mg-rast.org/download/mgm4549958.3?file=050.1"

ADD REPLY
0
Entering edit mode
File 050.2 - This is the unfiltered metagenome that was originally uploaded to MG-RAST
File 100.1 - preprocess.passed.fna
File 100.2 - preprocess.removed (low quality)
File 350.2 & 350.3 - These are the protein coding genes (amino acids and nucleotides)
File 440.1 - These are predicted rRNA sequences (I do not recommend using MG-RAST for sensitive rRNA annotation. It does not use the internal structure of the gene, which other programs appropriately use for classification)
File 550.1 - This file shows clustered sequences which are 90% identical, to reduce the number of sequences that need to be annotated. Many folks don’t even know that this happens within MG-RAST.
File 650.1 & 650.2 - These files are essentially the blat tabular output from comparing your sequence to the database.

see example: http://metagenomics.anl.gov/metagenomics.cgi?page=DownloadMetagenome&metagenome=4447943.3

http://api.metagenomics.anl.gov/1/download/mgm4447943.3

ref: http://adina-howe.readthedocs.io/en/latest/mgrast/index.html

ADD REPLY
0
Entering edit mode

Hello, what language is this besides curl? I have a windows computer and am using curl through the command prompt, and would like to do something similar to what you described.

ADD REPLY
1
Entering edit mode

It's Bash. I presume you could get Bash working on Windows with Cygwin or something but I haven't used Windows since XP so I can't really say. Alternatively, it probably doesn't take much effort to make a simple while read line loop with something that works on Windows by default like Python (?) or Java (?). If you plan to do lots of bioinformatics in the future, I suggest you ditch Windows for Linux or OS X.

ADD REPLY
0
Entering edit mode

Update- I downloaded Git Bash for windows, and I think I am having success using bash and curl with the command you listed. I am not very experienced with the Bash language yet, so I haven't added in any echo tests to see if the script is working the way I think it is. I do agree with you that windows is a hassle, but many times there are work-arounds. I will continue down this line for the people who use Windows and can not afford to/do not want to switch operating systems.

ADD REPLY
0
Entering edit mode
5.8 years ago

I have a question... I have downloaded project files using MG-RAST tools mg-download.py --project projectid

But, now I want metadata for particular project (which file corresponds to which sample type?). How to download metadata?

ADD COMMENT

Login before adding your answer.

Traffic: 1873 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6