Question: Microbial diversity analysis using whole-genome metagenomic data
gravatar for elsoja
3.1 years ago by
elsoja120 wrote:

I have data, obtained from a single metagenomic DNA sample, that consists of two MiSeq FASTQ files (R1 and R2) that I merged using PEAR.

Now I want to estimate the abundances of the bacteria taxa to generate a figure like this one:

enter image description here

Figure from: Panosyan, Hovik, and Nils‐Kåre Birkeland. "Microbial diversity in an Armenian geothermal spring assessed by molecular and culture‐based methods." Journal of basic microbiology 54.11 (2014): 1240-1250.

The problem is that there wasn't a step of amplification of the 16S region as the goal of the sequencing was to discover new genes. I've already isolated 16S reads from my sample using SortMeRNA, but it seems like softwares that do OTU picking, taxonomic assignment and diversity analyses (such as mothur and QIIME) require that all the reads come from the same region of the 16S gene.

Is there a way of using these 16S reads that I've filtered using SortMeRNA in a diversity analysis using mothur/QIIME?

qiime metagenome taxonomy mothur • 2.2k views
ADD COMMENTlink modified 3.1 years ago by Brian Bushnell17k • written 3.1 years ago by elsoja120

Cross-posted on StackExchange

ADD REPLYlink written 3.1 years ago by elsoja120
gravatar for Bioinformatics_NewComer
3.1 years ago by
Genomic Island
Bioinformatics_NewComer320 wrote:

If you've access to computing cluster then metagenomic tools like CLARK-S would be good to try. They give you abundances and allow you to perform other analyses.

ADD COMMENTlink written 3.1 years ago by Bioinformatics_NewComer320

Thank you. For what I've read, tools like CLARK, KRAKEN and Kaiju are the answer for my problem.

ADD REPLYlink written 3.1 years ago by elsoja120
gravatar for Brian Bushnell
3.1 years ago by
Walnut Creek, USA
Brian Bushnell17k wrote:

You could try assembling your 16S reads with an assembler that deals well with branches (possibly SPades), then aligning the resulting assemblies to other 16S sequences and trimming off the bases that go off the end (and are thus not 16S bases). I'm not sure how well that would work; depends on the data.

But I think your best bet would be to use the shotgun data as shotgun data, instead of trying to shoehorn it into 16S-based tools. We commonly assemble the whole metagenome, map the reads to the assembly to calculate coverage, and then align the contigs to existing databases like RefSeq to find out what they are. Once you know that contig_123 maps to E.coli, and has coverage of 43x, you can say you probably have 43x coverage of E.coli in your data. Whether this approach works depends on whether you have enough data to assemble; if only, say, 10% of your reads map to the assembly, then it's pretty much a failure and you'll need a different method.

One thing to try in that case is to compare reads directly to RefSeq to find what organisms they came from. You can get a list of organisms observed in your data with BBMap like this: in=data.fq refseq records=400

Once you know which organisms are present, you can download their genomes and map reads to them for quantification purposes. Mapping to all of refseq directly normally takes too much time or memory to be practical.

You might also check out KRAKEN which looks like it is designed for this purpose. I have not tried it, though.

ADD COMMENTlink modified 3.0 years ago • written 3.1 years ago by Brian Bushnell17k

Hi all,

Long delay, but may be helpful. I was wondering how I can study the microbiome (say microbial flora) from whole genome sequencing data of healthy people belong to a given population? Please kindly let me know your suggested procedure, any pipelines?

Thank you

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by seta1.4k

Please create a new thread. Thanks.

ADD REPLYlink written 2.2 years ago by Bioinformatics_NewComer320
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1798 users visited in the last hour