How to deal with differences in sample size for metagenomic diversity based on abundance?
21 months ago

I have a large metagenomic sample set across 15 sampling locations. some locations have as many as four replicates while others have only a single replicate, meaning the number of fastq reads I have for each site can vary by large numbers.

I can calculate abundance with Bracken, and pull the data into something like phyloseq for diversity analysis, but do I need to normalize the number of reads? That is, I don't want my diversity estimates to be bias because of unequal sample size.

What is the best way to deal with this problem? any help is appreciated.

metagenomics sample-size

