Entering edit mode
21 months ago
goatsrunfaster ▴ 50
I have a large metagenomic sample set across 15 sampling locations. some locations have as many as four replicates while others have only a single replicate, meaning the number of fastq reads I have for each site can vary by large numbers.
I can calculate abundance with Bracken, and pull the data into something like phyloseq for diversity analysis, but do I need to normalize the number of reads? That is, I don't want my diversity estimates to be bias because of unequal sample size.
What is the best way to deal with this problem? any help is appreciated.