Question: Looking for a tool for analyzing human microbiome data
0
gravatar for benzhang
3.2 years ago by
benzhang0
benzhang0 wrote:

Hi all! I'm currently doing an analysis project on some human microbiota data from NIH Human Microbiome Project. I wish to search the microbiome data for the presence of certain bacteria and their relative amount, using the reference bacterial genome provided in the NIH Human Microbiome Project website. To be a little more specific, I want to map the microbiome sequence reads to a specific bacterial reference genome with 97% sequence similarity(exclude 16S). Can anyone recommend me a tool or a software that can accomplish this? Thank you!!

alignment genome • 1.4k views
ADD COMMENTlink modified 3.2 years ago by Nari870 • written 3.2 years ago by benzhang0
3
gravatar for natasha.sernova
3.2 years ago by
natasha.sernova3.6k
natasha.sernova3.6k wrote:

See the links below:

This is the main site:

http://hmpdacc.org/

Its tools:

http://hmpdacc.org/resources/tools_protocols.php

Analyzing the Human Microbiome: A "How To" Guide for Physicians:

http://www.medscape.com/viewarticle/828715_3

It is based on this paper:

http://www.nature.com.sci-hub.cc/ajg/journal/v109/n7/full/ajg201473a.html

Better understand the role of microbes in human health and disease:

http://www.illumina.com/areas-of-interest/microbiology/human-microbiome-analysis.html

Disease cases:

https://commonfund.nih.gov/hmp/programhighlights

Old, but highly cited paper:

Human Microbiome Analysis

http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002808

Some people looked even here:

Ancient human microbiomes

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4312737/

ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by natasha.sernova3.6k

Thank you, Natasha!

ADD REPLYlink written 3.2 years ago by benzhang0
2
gravatar for Brian Bushnell
3.2 years ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

BBMap has a couple relevant flags - idfilter and idtag. For example:

bbmap.sh in=reads.fq outm=mapped.sam ref=genomes.fa idfilter=0.97

...will only map the reads with at least 97% identity, and "outm" rather than "out" means only the mapped reads will be printed to the sam file. It will also add a field to the output indicating each read's identity to the reference.

Excluding 16S is a bit more difficult, though you can do that by masking ribosomal sequence prior to mapping. You can do that with BBMask. Alternatively, you could filter ribosomal reads from your data prior to mapping, using some database like Silva.

Typically, though, if you want to quantify the proportion of reads that came from various different references, I'd suggest using BBSplit or Seal, both of which are in the BBMap package. Seal is faster and easier to use; BBSplit is more accurate and uses less memory.

ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by Brian Bushnell16k

Thank you so much Brian!

ADD REPLYlink written 3.2 years ago by benzhang0

Hey, Brian! I've been using bbmap and it works great, however, I'm looking into different metagenomic data and many different bacterial species at the same time. I was wondering if I'm able to create a custom database of all the bacterial genomes, so that I can map all the species to a metagenomic read? Thanks!

Best, Ben

ADD REPLYlink written 3.2 years ago by benzhang0

Hi Ben,

You can simply concatenate all of the fasta references together, and map the reads to the combined file with BBMap. The sam output will indicate which reference each read mapped to. Does that answer your question?

-Brian

ADD REPLYlink written 3.2 years ago by Brian Bushnell16k

Hi, Brian! Thanks for getting back to me. Do I just use Cat command line tool? Thanks!

ADD REPLYlink written 3.1 years ago by benzhang0

Yep, cat will work fine.

ADD REPLYlink written 3.1 years ago by Brian Bushnell16k

Hi, Brian. Sorry, one more question. Can you teach me the command line I should use if I want to quantify the relative amount mapped to the reads with BBSplit? Thank you so much!

ADD REPLYlink written 3.1 years ago by benzhang0

More complete instructions are in the shellscript bbsplit.sh), if you open it with a text editor or run it with no arguments. But in general, when you have multiple reference fasta files (for example, x.fa and y.fa) you would do this:

bbsplit.sh ref=x.fa,y.fa in=reads.fq basename=out_%.fq  outu=unmapped.fq refstats=refstats.txt

This will produce three files: out_x.fq, out_y.fq, and unmapped.fq, each containing the reads that mapped best to that reference (or for unmapped, did not map to any reference). Refstats.txt will give counts of reads that mapped to each reference.

ADD REPLYlink written 3.1 years ago by Brian Bushnell16k
1
gravatar for Nari
3.2 years ago by
Nari870
United States
Nari870 wrote:

Here is the recent paper for accurate read assignment to organisms.
http://nar.oxfordjournals.org/content/43/10/e69.full.pdf+html

They have developed GOTTCHA.
GOTTCHA is an application of a novel, gene-independent and signature-based metagenomic taxonomic profiling method with significantly smaller false discovery rates (FDR).

ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by Nari870

Thank you, Nari!

ADD REPLYlink written 3.2 years ago by benzhang0
0
gravatar for benzhang
3.2 years ago by
benzhang0
benzhang0 wrote:

Thank you for the links, Natasha, and Thank you for the tools, Brian! I really appreciate your help!

ADD COMMENTlink written 3.2 years ago by benzhang0

If you found what you need then please accept an answer (or wait for more if not completely satisfied)

ADD REPLYlink written 3.2 years ago by WouterDeCoster41k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 997 users visited in the last hour