Question: Downloading Data From Mg-Rast
2
gravatar for bioinfo
5.7 years ago by
bioinfo740
bioinfo740 wrote:

Does anyone familiar with downloading data from MG-RAST? I have more than 100 metagenome ids that need to be downloaded in an efficient way. I found this link at MG-RAST (http://api.metagenomics.anl.gov/1/api.html#download) but couldn't manage to download those 100 metagenomes using their ids (e.g. 4441908.3). I dont want to download one by one with individual ids as it will take ages..!!

parsing • 8.8k views
ADD COMMENTlink modified 16 months ago by Dattatray Mongad340 • written 5.7 years ago by bioinfo740
4
gravatar for 5heikki
5.7 years ago by
5heikki8.6k
Finland
5heikki8.6k wrote:

I did something like this a while back:

cat keywordTableSortedUniqueIds.txt
mgm4440036.3
mgm4440037.3
mgm4440038.3
mgm4440039.3
mgm4440040.3
mgm4440041.3
mgm4440055.3
mgm4440056.3
..

while read line
do
curl http://api.metagenomics.anl.gov/1/download/"$line"?file=425.1 > $line.gz
done 

The "file=XXX" part specifies what exactly you want to download from the given metagenome, e.g. 425.1 here specifies predicted rRNA.

ADD COMMENTlink written 5.7 years ago by 5heikki8.6k

that was very helpful. I have just gone through the MG-RAST manual but didn't get much info about the "download stages or file=xxx/stage=xxx". As you mentioned, file=425.1 for predicted rRNA, Do you know what file no. should I use for raw original submitted metagenome fasta sequences? I tried "file=100.2" but not sure if it is right..!!

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by bioinfo740

Hey, I'm not sure you can gain access to the raw data by the api, however, I think file=100.2 contains the reads/contigs that passed quality filtering. There's probably also a file that contains the reads/contigs that didn't pass QC, so you could combine those if you really wanted them. You could always ask at the mg-rast mailing list..

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by 5heikki8.6k

Thanks. Now I have decided to go for reads that passed QC filtering and dereplication stages..!!

ADD REPLYlink written 5.7 years ago by bioinfo740

Hi, Thanks for these details about how to download data from the MG-RAST api. Did you add your webkey to access data that is not public yet? Or were these public metagenomes?  I tried adding my webkey:

curl -H "auth: XXX" http://api.metagenomics.anl.gov/1/download/"$line"?file=100.2

but I just get a summary of the file info (bp_count etc), and I'm unable to download the fasta file.

Thank you!

Katrine

ADD REPLYlink written 5.4 years ago by Katrine20
File 050.2 - This is the unfiltered metagenome that was originally uploaded to MG-RAST
File 100.1 - preprocess.passed.fna
File 100.2 - preprocess.removed (low quality)
File 350.2 & 350.3 - These are the protein coding genes (amino acids and nucleotides)
File 440.1 - These are predicted rRNA sequences (I do not recommend using MG-RAST for sensitive rRNA annotation. It does not use the internal structure of the gene, which other programs appropriately use for classification)
File 550.1 - This file shows clustered sequences which are 90% identical, to reduce the number of sequences that need to be annotated. Many folks don’t even know that this happens within MG-RAST.
File 650.1 & 650.2 - These files are essentially the blat tabular output from comparing your sequence to the database.

see example: http://metagenomics.anl.gov/metagenomics.cgi?page=DownloadMetagenome&metagenome=4447943.3

http://api.metagenomics.anl.gov/1/download/mgm4447943.3

ref: http://adina-howe.readthedocs.io/en/latest/mgrast/index.html

ADD REPLYlink modified 16 months ago • written 16 months ago by Zhilong Jia1.5k

Hello, what language is this besides curl? I have a windows computer and am using curl through the command prompt, and would like to do something similar to what you described. 

ADD REPLYlink written 4.7 years ago by kbrannen0
1

It's Bash. I presume you could get Bash working on Windows with Cygwin or something but I haven't used Windows since XP so I can't really say. Alternatively, it probably doesn't take much effort to make a simple while read line loop with something that works on Windows by default like Python (?) or Java (?). If you plan to do lots of bioinformatics in the future, I suggest you ditch Windows for Linux or OS X.

ADD REPLYlink modified 4.7 years ago • written 4.7 years ago by 5heikki8.6k

Update- I downloaded Git Bash for windows, and I think I am having success using bash and curl with the command you listed. I am not very experienced with the Bash language yet, so I haven't added in any echo tests to see if the script is working the way I think it is. I do agree with you that windows is a hassle, but many times there are work-arounds. I will continue down this line for the people who use Windows and can not afford to/do not want to switch operating systems.

ADD REPLYlink written 4.6 years ago by kbrannen0
0
gravatar for Dattatray Mongad
16 months ago by
National Centre for Cell Science, Pune
Dattatray Mongad340 wrote:

I have a question... I have downloaded project files using MG-RAST tools mg-download.py --project projectid

But, now I want metadata for particular project (which file corresponds to which sample type?). How to download metadata?

ADD COMMENTlink written 16 months ago by Dattatray Mongad340
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1730 users visited in the last hour