Parsing a Megan (RMA) File
1
0
Entering edit mode
4.8 years ago

Hi,

I having a problem with MEGAN 5. I'm working with some quite large RMA files (40 Gb aprox), built using the trimmed reads and a blast run. The problem is that my (poor little) PC always hangs up when I try to open them. So, I was wondering if there's a way to parse a file via Megan command line, dividing it in "Bacteria", "Archaeas" & "Viruses" or whatever, so the files become a little bit smaller. Thanks!

Bioinformatics Metagenomics Megan • 2.5k views
ADD COMMENT
0
Entering edit mode
4.8 years ago

I'm not very familiar with MEGAN, but I would imagine a BLAST search could get quite time-consuming if searching a database of all known metagenomics sequences, especially if you have over 1 million reads.

Did you amplify ribosomal RNA sequences? If so, these are some programs that should provide less computationally intensive options:

1) RDPclassifier (web-based or local .jar file) - https://rdp.cme.msu.edu/classifier/classifier.jsp

2) mothur - http://www.mothur.org/

3) QIIME - http://qiime.org/

They don't work with RMA files, but you must have had some sort of sequence to produce the RMA file. If you have .fastq files, mothur and QIIME can take those as an input (and convert to .fasta file, if you wanted to try the RDPclassifier as a standalone tool).

ADD COMMENT
0
Entering edit mode

Thanks Charles for replying. Yes, I have the 16S sequences and I've already used QIIME. But now I'm working on the WGS reads and I wanted to do another taxonomical analysis (beacuse by using 16S, you leave behind viruses and eukaryotas). That's why I tried MEGAN. For the BLAST part, I used DIAMOND, which is waaay faster than regular BLAST (although, each search takes almost a day). I got that one covered, but the results I get are killing my PC (still waiting for budget approval to buy a new one...)

ADD REPLY
0
Entering edit mode

Ok - I haven't tested any of the following programs, but it is possible that it might help you to use a different method to quantify species abundances that doesn't depend upon your BLAST file:

http://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0997-x

http://www.ccb.jhu.edu/software/centrifuge/

This one is really for transcriptomes, but it might still work:

https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0969-1 http://taxonomer.iobio.io/

ADD REPLY
0
Entering edit mode

Hi Did you find out any solution to this problem? I am facing the same problem. My rma files are almost 2 GB to 3GB in size but still, my pc hangs. If you found any solution please explain because that would be really helpful.

Thanks

ADD REPLY
0
Entering edit mode

I am not sure what to tell you about RMA files, since I don't typically work with Megan.

In general, there was some public eDNA re-analysis where I tried out various options:

https://github.com/cwarden45/PRJNA513845-eDNA_reanalysis/blob/master/metagenomics/README.md

Running MEGABLAST does take a while, even with prioritizing more highly expressed sequences (unless you go even further). In that situation, I was specifically looking for artifacts, so I specifically was trying to look at less common things.

However, in other situations, maybe looking something like those present at >1% (or even >1/10,000, for identical sequences) might help?

Also, in that situation, the SRA has some taxonomy assignments.

Assuming that you don't have human reads without consent for public deposit (or you filter the human reads), the SRA has some taxonomy assignments. For that eDNA project, you can see some selected notes here:

https://github.com/cwarden45/PRJNA513845-eDNA_reanalysis/blob/master/extended_summary.xlsx (if you download the file to view locally).

In other words, if you haven't already deposited your data in the SRA, that is generally a good practice and might be helpful for analysis in some situations?

ADD REPLY

Login before adding your answer.

Traffic: 2424 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6