My first question is what are the uses/applications of reference genome mapping?

Second, assuming that I have sequenced an environmental sample for metagenomic analysis, since I have no idea of its taxonomic composition, how would I decide which genome I should use as a reference for mapping my reads?

Hi Monzoor. Here are a few tips for writing better questions. 1) Describe what you are doing. For people to help you, they must have a good understanding of what you are trying to accomplish. 2) Describe what you have tried so far, so that you don't get answers that tell you to do stuff you have already done. 3) Ask a specific question, including a question mark! This seems obvious, but I had to add 3 question marks to your questions and remove one from a sentence that was clearly not a question. 4) In order to get informative answers, you must write informative and well formated questions. :)

And you can edit, as soon as you have the time, your question (i.e. this question) to take Eric's suggestions into account.

I agree. Will be careful next time.

Hi Monzoor, answers from my experience, hope they help:

1)In my understanding single reference genome mapping in used, when you have sequence reads that can be linked to a single (or very few reference genomes). Applications for reference genome mapping are among others:

  • re-sequencing of genomic DNA
  • genome sequencing of individuals for variant detection (SNPs, copy number variations)
  • sequencing of new species closely related to reference to guide assembly
  • RNA-seq
  • ChIP-seq

2) Metagenomics was not in that list. You named the reason for this already, you cannot know what you will see in that complex mixture of DNA. As you cannot decide a priori, the best bet is to BLAST against all available sequences. For example, use the full NT Blast database. Often, metagenomics is interested in microbial communities, so you could restrict the search to prokaryote sequences, while neglecting any eukaryote contribution or contamination.

Reference genome mapping can be useful even in metagenomics when you for example find a predominant taxonomica assignment in the first step. Then you can try map reads or assemble them using the predominant taxon/organism. We tried this for example in an analysis of the metagenome of a biogas plant and there other publications doing something similar.

Citing the abstract as an example:

A significant portion of contigs was allocated to the genome sequence of the archaeal methanogen Methanoculleus marisnigri JR1. Mapping of single reads to the M. marisnigri JR1 genome revealed that approximately 64% of the reference genome including methanogenesis gene regions are deeply covered.

