Question: RAM / cache usage in frequently use BI pipeline
0
gravatar for CY
3 months ago by
CY40
United States
CY40 wrote:

I think we all agree that RAM / cache consumption is a issue when launch a bioinformatics pipeline. I would like to open a discussion on which tools / steps demand high volume of RAM. I know STAR is pretty RAM consuming when load genomes (30gb). And indexing, sorting probably.

Can anyone share some general insight on this topic? Any ideas are appreciate!

snp rna-seq next-gen alignment • 182 views
ADD COMMENTlink written 3 months ago by CY40

Assemblies of large genomes/datasets are going in invariably need large amounts of RAM. Trinity for example will need a GB of RAM per million paired-end reads.

clumpify.sh from BBMap suite is able to deduplicate data in RAM with an option. I have used clumpify with 50G NovaSeq data and had seen memory consumption as high as 1.3TB. Note: Clumpify is also able to use disk storage and temp files, so this is not an absolute requirement.

In general, some software would be able to use workarounds (like using temp storage) but for others (assembly) there may not be a valid alternate option for gobs of RAM.

ADD REPLYlink modified 3 months ago • written 3 months ago by genomax34k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 956 users visited in the last hour