Question: Ribosomal reads in shotgun metagenomics data
0
gravatar for grp2009
15 months ago by
grp200920
grp200920 wrote:

I am looking at some shotgun (not amplicon) metagenomics data, and have observed that among the reads that are classified as belonging to specific bacteria, most are from ribosomal genes (as determined later by BLAST). This is despite the fact that this is not targeted amplicon sequencing. My interpretation is that most of the bacteria in the sample are absent from the reference database used for classification, but that due to the high level of conservation of the ribosomal genes, these are still appearing in the classification results because those portions of the genomes are "close enough" to previously sequenced genomes.

My first question is: is this a plausible interpretation of what I'm observing? Follow-up: is it a common issue with shotgun metagenomics? Secondly (let me know if this should be a separate question): is there an efficient way to "fish out" previously unclassified reads based on their overlap with a particular set of ribosomal reads from the data? I suppose this would amount to doing genome assembly, but using certain selected reads as a target or seed for assembly.

Background: What we have is Illumina paired-end (2x150bp) data from shotgun metagenomics, which I have run through Kraken (using the 8GB Minikraken database). The first thing I notice is that 99.9% of the reads are unclassified. That seems to hold true with other methods of classification (Metaphlan2 and a cursory BLASTing of a few reads). A small fraction of reads are classified as belonging to certain bacteria. I mapped those reads to the corresponding genome using Bowtie2, hoping to validate the presence of that bug in the sample. After mapping, I see very clear peaks in coverage, rather than reads mapping throughout the genome. Furthermore, the mapped reads BLAST to ribosomal sequences.

ADD COMMENTlink modified 14 months ago by Biostar ♦♦ 20 • written 15 months ago by grp200920

is there an efficient way to "fish out" previously unclassified reads based on their overlap with a particular set of ribosomal reads from the data?

You can bin/fish out reads from a dataset with bbsplit.sh from BBMap and a list of reference sequences you are interested in.

I suppose this would amount to doing genome assembly, but using certain selected reads as a target or seed for assembly.

How so?

ADD REPLYlink written 15 months ago by genomax64k

From what I understand, bbsplit would allow me to map reads to reference genomes. In contrast, I'm talking about reads that do not map to any known reference genome. I would like to connect these unmapped reads (by overlapping/assembling) to the reads that mapped (imperfectly) to known ribosomal genes. In essence, this amounts to doing assembly of the metagenomic reads, and then picking the contig that includes the ribosomal sequence of interest. But it would be nice to be able to do this without attempting a complete assembly of all the reads.

To put it another way: in my metagenomic data I have reads that are "close" to the ribosomal genes of bacterium B (close enough to map or BLAST). But I don't think that I have bacterium B in the sample, because the rest of its genome is entirely missing from the my data. Instead, I think I have some other related bacterium, X, which is not represented in the current reference databases. I want to try to get as much as possible of X's genome, by seeing what reads overlap with the ribosomal reads (hence my use of the word "assembly").

ADD REPLYlink written 15 months ago by grp200920
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 940 users visited in the last hour