Question: Relative abundance fro metagenomics samples
3
gravatar for David
2.6 years ago by
David160
David160 wrote:

HI, I have a gut metagenomics sample (WGS from illumina 2x150bp). The following custom pipeline has been applied:

1 - Filtered my reads (to remove human contaminants and phiX) 2- Assembly with Megahit to get contigs 3 - Binning megahit contigs with metabat 4- Gene prediction on contigs with prodigal (got genes and proteins) 5- Assigned taxonomy to the (bins or genes) with Kaiju

The thing is how can i get the relative abundance for the species present in the sample. Should i map each of my genes back to my reads and simply count the mapped reads. Then divide the number of mapped reads by the total number of reads to get the relative abundance. For example if i get 100000 reads mapped to one gene and my sample has 1M reads than i can assume that the relative abundance of that species is 10% ?? Am i correct or how would you get the relative abundance ?

Thanks for your comments.

ADD COMMENTlink modified 5 months ago by lagartija60 • written 2.6 years ago by David160
0
gravatar for Brian Bushnell
2.6 years ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

You should map your reads to the assembly. The abundance is the average coverage of a gene, not the number of reads mapping to it. For example, using BBMap:

bbmap.sh in1=r1.fastq in2=r2.fastq ref=genes.fasta out=mapped.sam covstats=covstats.txt

covstats.txt will tell you the average coverage of each gene, which is proportional to the abundance (ignoring bias).

ADD COMMENTlink written 2.6 years ago by Brian Bushnell16k
0
gravatar for David
2.6 years ago by
David160
David160 wrote:

Thanks Brian, My confusion was coming from number of mapped reads vs coverage.

The output gives a mean Avg_fold of 16.895 (see attached picture summary of the output covstats.txt file). If i do this for all my group of genes (coming from the binning) i will end up with several coverages.

Then if we have mapped 10 different bins coming from the same sample the overall coverage should be 100 ?

Also what happens if one gene reference file mapps to two different species ?

https://dl.dropboxusercontent.com/u/24466146/mapped_reads_to_reference_genes.png

ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by David160

Please use ADD COMMENT/ADD REPLY to keep threads logically organized when responding to existing posts. This belongs up against @Brian's post.

It may be best to put the image up (at postimage.org or other free image providers). Clicking on unknown dropbox links is an inherent risk.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by genomax70k

Hi David,

Unfortunately I'm not completely clear on what you are asking. Can you clarify? Nothing should necessarily add up to 100...

And I'm not sure what you mean by "one gene reference file mapps to two different species". You're mapping reads to genes, not genes to species. But certainly, it is possible for the same gene to occur in two different species...

ADD REPLYlink written 2.6 years ago by Brian Bushnell16k

Sorry for not being clear.

The idea at the end is to obtain an OTU table from the metagenomics sample. Say your sample contain 10 species. If you follow the pipeline you end up with a list of genes (or bins) corresponding to each species. I want to know the relative abundance of each of the species.

Programs like kraken or kaiju do it directly from the raw reads but i wanted to do it from the final bins (or genes predicted for each bin). Do that makes sense ?

The problem is that OTU table is normally for 16S but not sure how it works for WGS to establish such table witha bin for instance ? (In my case my sample contains 15 bins, although there are only 10 species).

thanks,

ADD REPLYlink written 2.6 years ago by David160

Did you ever figure this out David? I am actually trying to do the same thing.

ADD REPLYlink written 7 months ago by infenit10120

What i did was to map reads back to each of the bins so you get the bin coverage. Assuming your bin corresponds to one genome you get an approximate number of copies.

ADD REPLYlink written 7 months ago by David160
0
gravatar for lagartija
5 months ago by
lagartija60
lagartija60 wrote:

Hi, I'm found this post useful because I have the same task (exept that I prefere to work on contigs than on bins). I also mapped the reads to my contigs to get the mean coverage. Is that a sufficient estimation of the (relative) abundance or do I have more steps to do to reduce biases ?

ADD COMMENTlink written 5 months ago by lagartija60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 829 users visited in the last hour