I am using dedup fucntion in UMI tools but still there seems to be an issue of excessive memory usage while the output is being generated. Kindly let me know if there's an alternative tool/ way to remove the remove the duplicates and count the UMI's.
If you are doing drop-seq or 10x chromium, I highly recommend alevin. Other tools that can handle UMI deduplication are STARsolo, umis and picard MarkDuplicatesWithCigar. Not that the last two do not do error correction.
Two things might cause excessive memory usage:
Many reads whose pairs are on a different contig - here there is no solution unless you are willing to drop these reads - no other tool is going to do any better.
Analysing single-cell RNA seq without using the --per-cell option.
Extreme read depth, with an appreciable % saturation of the space of possible UMIs.
The general advice for to reduce the memory usage is to not use the --output-stats option.
If you are struggling with high read depth and UMI space saturation, you can switch to --method=unique. The downside of this is that you loose UMI-tools' error correcting dedup, which we have shown introduces bias, especially at high depth. The upside is that this makes UMI-tools effectively function the same as any other UMI aware tool. Only really umi_tools, alevin and STARsolo use error correction on UMIs. Otherwise, it just is an intrisically high memory requiring task.