I have around 60 metagenomic read sets from 60 different individuals (one sample per person). I want to find areas of differential mapping between these samples. In other words, do sample A and B both share similar mapping patterns to contig A? I have tried assembling samples separately, then combining assemblies and deduplicating the assembly using
dedupe.sh from BBMap. This still left a lot of similar contigs and negatively impacted mapping quality due to duplicated sequences in the assemblies. I can play around with increasing the minimum sequence identity for the deduplication step, but the extent that I needed to deduplicate removed a good amount of sequence, which could confound our analysis.
I have now started wondering if co-assembling our samples together would be a better approach than assembling samples separately. My concern is that co-assembling possibly disparate samples would be create assemblies based on non-real read combinations from different samples.
Please let me know if you folks have any advice!