I'm trying to extract rRNA sequences from my metagenomic datasets in order to perform diversity analysis on qiime. The overall purpose is compare the microdiversity (one target species) pointed by MUMmer, using the entire dataset and the microdiversity pointed by qiime diversity analysis.
So I chose rRNASelector and Metaxa to extract the sequences.
I'm having trouble to run rRNASelector on unix server through ssh.
It starts, the dialog box opens on my screen, I choose the reads file (fasta), but then I get this error message:
Starting hmmsearch for forward using lib/16s_bact_for3.hmm Not right file format! -> Abort
My hmmer is ok, when I type
hmmsearch -h it gives to me the help information.
I've tried to run it choosing all possible paths for hmmsearch, but it just crashes. Besides this, the dialog box dynamics is so slow, I cant either change the length parameter...
In parallel, I'm running Metaxa, and it seems to work well, but for my metagenomic dataset it is finding too few sequences. Is a poor richness dataset, in fact, but even though I was expecting more. My output for bacteria sequences is 61 reads of a total of 110504 sequences. This sequences are from Illumina paired end, merged through Flash, with a average combine of 81% and average length of 293
The sequences used for rRNASelector are the same
If someone have experience with this kind of workflow, I appreciate every suggestion, because I dont have much experience, and maybe I am missing something, or doing something wrong.
Thank you so much!
If you copy/paste your exact rRNASelector commands there is a higher chance of getting help.
Are you using the command-line or a graphical session over ssh? Graphical Xserver session over ssh may be really slow.
Hello, yes, I am trying to use the graphical session over ssh. I just typed
java -jar RNAselector.jarand then the dialog box opens but it's really slow.... it's not working... Could I run it just from command line? It would be great!
You could try EMIRGE. Indeed, it seems rRNASelector is a GUI-only application.
61 reads out of 110,000 might not be too far off what you expect. I have had 1 in 1,000 reads from a metagenome map to SSU rRNA, and that was after removal of the human sequences from an oral sample. Although maybe you left off a digit or two from the number of starting reads?
Thank you for all suggestions!
I am working with the output from Metaxa2, since it seems pretty accurate.
This is my extraction summary, I would be grateful if you to comment and tell me what you think these ratios:
I have two datasets of metagenomic sequences (Illumina paired ends, demultiplexed). Each dataset has 2 samples. The input for Metaxa2 pipeline was the original fastq files (R1, R2) for each sample. My extraction summary for bacterial SSU rRNA:
For the first dataset, a oligotrophic freshwater sample (very low DNA content) I got 149 SSU rRNA sequences from a total of 135069 sequences for one sample and 399 SSu rRNA sequences from 416595 for another sample. For the second dataset, a hypereutrophic freshwater sample (a lot of DNA..) I got 2070 rRNA sequences from a total of 3008033 from one sample and 1448 from a total of 2405944 for another sample.
Thank you for any comment!