Question: Illumina Pe Reads Mapping To Reference Genome
6.8 years ago by
rohita.sinha0 wrote:

I have around 300 metagenomic sample set of PE-illumina reads. Assembly of these reads give me fairly longer contigs and I can map around 80-90% of my reads back to these contigs.

But when I map same reads to ~5000 complete microbial reference genomes, only a small fraction of reads (2%-8%) are mapped. Even if I forget that I can map my reads back to my contigs, I am surprised to see a metagenomic sample with only 2-8% reads from known microbial genomes.

I have used bowtie2 with default --end-to-end as well as --local settings.

Can any one guess about the probable situation?

did you blast your contigs to see what they map to?

ADD REPLYlink written 6.8 years ago by Andreas2.4k
6.8 years ago by
Lee Katz3.0k
Atlanta, GA
Lee Katz3.0k wrote:

I have two guesses, one computational and one biological. I have no idea if I'm actually right, but hopefully it can get you onto the right path.

  1. You are getting misassemblies. Is there a better assembler to use (e.g. Ray Meta)? Are your parameters too aggressive? You can alter the settings by increasing the overlap length, etc.
  2. Your metagenomics sample has many new taxa that are not characterized yet. Therefore they wouldn't have representative assembled contigs in your reference genomes.
+1, My guess is you are right on both accounts Lee, the contigs are composed of misassemblies and there's a lot of biological diversity out there that we haven't tapped yet. This is partly one of the reasons I think people with metagenomic data should identify reads and then take time to assemble with algorithms designed for the strategy of dealing with extreme diversity.

ADD REPLYlink written 6.7 years ago by Josh Herr5.7k
