Question: Rarefaction/Saturation Curve Based On Ngs Data
6
gravatar for Biogenomics
8.9 years ago by
Biogenomics60
Leuven, Belgium
Biogenomics60 wrote:

Hi all,

This is most likely a simple question, but I'm looking for a tool (software, python/Perl/R script) that would produce a rarefaction curve based on an assembly file (ACE format would be easiest) to assess the number of reads needed to yield all observed contigs (cfr species diversity index). This would most likely be done through sampling reads within the ACE file and aligning them on the assembled contigs. I am interested to compare such rarefaction curves for data produced from normalized and non normalized libraries.

Alternatively, what approach would you use to automate (or semi-automate) such a task?

thanks

Greg

• 5.0k views
ADD COMMENTlink modified 4.9 years ago by mikhail.shugay3.3k • written 8.9 years ago by Biogenomics60

Hello greg, were you able to find some tool/script for the analysis? Could you let us know if you were able to?

ADD REPLYlink written 6.1 years ago by Prakki Rama2.3k
2
gravatar for Casey Bergman
8.7 years ago by
Casey Bergman18k
Athens, GA, USA
Casey Bergman18k wrote:

As you allude to, your problem is related to species richness calculations, so perhaps you could have a look at how to pose your problem in those terms and use rarefaction functions in a meta-genomics suite like mothur. Another option would be to pinch functions from the mothur source code and adapt to your problem.

ADD COMMENTlink written 8.7 years ago by Casey Bergman18k

Mothur really works! I like it.

ADD REPLYlink written 8.7 years ago by Jarretinha3.3k

Hi jarrentinha, i am new to this kind of analysis. If possible, could you let us know how mothur can be used to plot the saturation curve between number of reads and number of genes?

ADD REPLYlink modified 6.1 years ago • written 6.1 years ago by Prakki Rama2.3k
0
gravatar for mikhail.shugay
4.9 years ago by
mikhail.shugay3.3k
Czech Republic, Brno, CEITEC
mikhail.shugay3.3k wrote:

As this topic was raised again, I would recommend reading Colwell et al on this topic. I would also like to ask what is the input format? If you have a simple frequency table, say

150 genes have 1 read

100 genes have 2 reads

...

1 gene has 6534 reads

...

1 gene has 20000 reads

I could share some code to build those rarefaction curves (and I think there are also a plenty of ecology-related software packages). Or you can adapt code from here: https://github.com/mikessh/vdjtools/blob/master/src/main/groovy/com/antigenomics/vdjtools/diversity/ChaoEstimator.groovy

ADD COMMENTlink modified 10 weeks ago by RamRS24k • written 4.9 years ago by mikhail.shugay3.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1267 users visited in the last hour