Hi, I'm trying to do a rarefaction curve in R using OTUs but I have no idea how around to do them. How would I procede to do that?
My data is metagenomic 16s sequencing and the files I have are divided by taxonomic level. The genus level for example has the genus on one column and the number of OTUs that align with that genus on the other column. Any idea?
Rarefaction is the number of unique OTUs described as a function of the number of units (reads, usually) sampled. So to produce the curve, you have to produce:
number of OTUs that align with that genus on the other column
for each level of subsampling depth you want to investigate.
Subsample your raw data, for example, every 10% from 10 -> 100%
For each of those, find the number of OTUs described at that taxonomic level
Plot the curve with # of unique OTUs on the y axis and the subsampling depth on the x axis
Pipelines like mothur and Qiime have these functions built in for 16S sequences.
The shuf util can easily be applied to this kind of subsampling. Basically you just increase -n for every round..