Deduplication
1
0
Entering edit mode
14 months ago

I am trying to run deduplication on the bam files using the command -

umi_tools dedup -I 0ng_Rep1.sorted.bam --output-stats 0ng_Rep1.sorted.deDup.stats -S 0ng_Rep1.sorted.deDup.bam --method directional  --log 0ng_Rep1.sorted.deDup.log.txt

However, sometimes it will work and sometimes the process gets killed by the system (may be because of excessive memory hogging)

I am running this on a system with 64gb ram using i9-12900 processor which didn't give me any trouble so far for bacterial NGS based data analysis. Any suggestions to avoid this problem will be helpful.

UMI UMI_TOOLS removal Deduplication PCR-duplicate • 950 views
ADD COMMENT
0
Entering edit mode

This is what is happening

[   49.039813] hv_balloon: Max. dynamic memory size: 52178 MB
[   61.369860] WSL2: Performing memory compaction.
[  851.939065] umi_tools invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
[  851.939067] CPU: 4 PID: 356 Comm: umi_tools Not tainted 5.10.102.1-microsoft-standard-WSL2 #1

The process is getting killed in the linux. Is there anyway to avoid the kill by oom.

ADD REPLY
0
Entering edit mode

i.sudbery may have some input.

You may not be able to run this on this machine if the process is running out of memory. May have to find a different machine. Dedup algorithms may need to keep a large amount of data in memory since they are comparing things in real time.

ADD REPLY
0
Entering edit mode
11 months ago

You can deduplicate the fastq using BBTool's Clumpify:

clumpify.sh in=reads.fq out=deduped.fq dedupe

Because this writes temp files for large datasets, it should run fine on data that won't fit into memory. However I don't think you can use bam input; you'd need to re-map it afterward.

ADD COMMENT

Login before adding your answer.

Traffic: 1173 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6