I would like to normalize my amplicon sequencing data from a metagenomics study. I have looked into RNAseq and 16S normalization methods and I'm looking more closely now at TMM and DESeq. The problem is two-fold: 1) I am not familiar with RNAseq or 16S analysis...I have never done either. I've read papers ( Li P et al, Evans et al, Li et al, and Risso et al), but I'm uncomfortable just "winging it". So my first question is: does anyone have suggestions I should keep in mind since I'm normalizing amplicon sequencing data that's not 16S (and I'm not working with RNAseq data)?
2) The data I have been provided with is not count data, I have mean depth and coverage for each gene and each sample. So I need some kind of normalization that can be applied to this level of data. Maybe that won't be ideal; that's ok I can use that as a first-pass and get count data as soon as I can. After normalization, I will need to produce a heatmap to visualize the results. I'm hoping to get some expert advice that I can combine with what I've gleaned from the literature.
Thank you all for your advice.
Sample data format:
obsID geneID averageDepth coverage 1 oqxA 252.6 1.00 2 erm(X) 1069.2 0.95 3 blaSHV 451.8 0.95 4 aph(6) 357.3 0.93 5 aph(3'') 92.6 0.75 6 dfrA17 48.6 0.74 7 mph(A) 16.4 0.73 8 blaTEM 950.2 0.68 9 strA 3075.4 0.65 10 erm(G) 18.5 0.63
averageDepth obtained from the sum of the per-base depth provided by Samtools divided by the length of reference coverage is number of bases with non-zero coverage divided by number of bases in reference