I ran FASTQC on a human gut metagenome sample and found that I have a duplication rate of about 80% Does this seem too high? I checked out some environmental samples and saw approximately the same rates.
I've read papers that recommend de-duplicating reads before analysis because they're most likely PCR artefacts. But I've read papers that recommend keeping all reads since some high abundance species will be sequenced deeply and some reads may be seen more than once. Any thoughts on the matter?