Hi,
I have a dataset that I would like to analyze using Salmon. The data is paired-end Illumina and has a very high duplication rate of 90-95% as determined by Markduplicates in Picard (small genome, oversequenced). I would like to remove duplicates using clumpify before Salmon. I know clumpify will do some sorting to reduce file sizes and I just want to make sure this will not interfere with Salmon, which I know does not want data sorted by coordinates. I'm assuming clumpify is not sorting this way, but just want to make sure I'm not missing anything. Thanks!