Question: Rarefaction/Saturation Curve Based On Ngs Data
gravatar for Biogenomics
8.5 years ago by
Leuven, Belgium
Biogenomics60 wrote:

Hi all,

This is most likely a simple question, but I'm looking for a tool (software, python/Perl/R script) that would produce a rarefaction curve based on an assembly file (ACE format would be easiest) to assess the number of reads needed to yield all observed contigs (cfr species diversity index). This would most likely be done through sampling reads within the ACE file and aligning them on the assembled contigs. I am interested to compare such rarefaction curves for data produced from normalized and non normalized libraries.

Alternatively, what approach would you use to automate (or semi-automate) such a task?



ADD COMMENTlink modified 4.4 years ago by mikhail.shugay3.3k • written 8.5 years ago by Biogenomics60

Hello greg, were you able to find some tool/script for the analysis? Could you let us know if you were able to?

ADD REPLYlink written 5.6 years ago by Prakki Rama2.2k
gravatar for Casey Bergman
8.2 years ago by
Casey Bergman18k
Athens, GA, USA
Casey Bergman18k wrote:

As you allude to, your problem is related to species richness calculations, so perhaps you could have a look at how to pose your problem in those terms and use rarefaction functions in a meta-genomics suite like mothur. Another option would be to pinch functions from the mothur source code and adapt to your problem.

ADD COMMENTlink written 8.2 years ago by Casey Bergman18k

Mothur really works! I like it.

ADD REPLYlink written 8.2 years ago by Jarretinha3.3k

Hi jarrentinha, i am new to this kind of analysis. If possible, could you let us know how mothur can be used to plot the saturation curve between number of reads and number of genes?

ADD REPLYlink modified 5.6 years ago • written 5.6 years ago by Prakki Rama2.2k
gravatar for mikhail.shugay
4.4 years ago by
Czech Republic, Brno, CEITEC
mikhail.shugay3.3k wrote:

As this topic was rised again, I would recommend reading Colwell et al on this topic. I would also like to ask what is the input format? If you have a simple frequency table, say

150 genes have 1 read

100 genes have 2 reads


1 gene has 6534 reads


1 gene has 20000 reads

I could share some code to build those rarefaction curves (and I think there are also a plenty of ecology-related software packages). Or you can adapt code from here:

ADD COMMENTlink written 4.4 years ago by mikhail.shugay3.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1874 users visited in the last hour