I am new to doing metagenomic studies. My co-adviser introduced me to KBase for analysis and I followed the tutorial narrative protocol on " Genome Extraction from Shotgun Metagenome Sequence Data."

What I've only done was just pair the reads, remove the adapters and low-quality reads. After which, I had them run on Kaiju for taxonomic classification with resolution down to species level. I've read some journals doing annotation without assembly, but I have not read doing the same thing with 16S data.

I may have overlooked that the platform mainly supports shotgun metagenomics. I'm not confident and will plan to do a common pipeline such as the MiSeqSOP by Mothur. But are my results valid, even with low confidence/accuracy?


Hi piercemanlangit, I don't understand clearly whether what you sequenced is 16S amplicons or whole metagenome - most of the first two sections reads like you deal with whole metagenome sequencing data - then you suddenly switch to 16S.

The distinction is important. There are so much more options from WMGS, and the methods are naturally very different for 16S.

For example, a key element of Kaiju is "protein-level classification". In my opinion, this won't work so well with 16S rRNA ;-)

Hi Carambakaracho, I'm dealing with 16S amplicons (V3-V4). Kbase offers a point-and-click interface so it was easy to navigate through. The results were also congruent with our culture, so I thought it was no problem.

Does this mean the results are erroneous? I'm just considering if I could still use it.


Well, if the results confirm what you know, it might have worked somehow. That's the thing with classification, it always works. However, sometimes you'll find Anthrax in the subway or a platypus in northern Europe.

If this will be part of a publication and the referee understands something about 16S classification, that referee might dismiss the method.

for reference, you mentioned a very popular software for 16S analysis, mothur, there's qiime, too. MEGAN provides a GUI interface, as far as I know. Recently, the FROGS pipeline gave me excellent results.

If you want to identify the sequences to species level then just blasting is maybe the best option. The identity and coverage will be your confidence/accuracy score. If you want to also identify them on a higher taxonomic rank if there is no significant BLAST hit you could use MEGAN, RDP classifier or SINTAX.

MEGAN has a UI, but most of the input for MEGAN needs to be generated with commandline tools.

