bbsplit running slow or out of memory?
1
1
Entering edit mode
3.3 years ago
rrdavis ▴ 60

Hello,

I have Illumina fastq files from some RNA-seq, ATAC-seq and WES that originated as PDX samples. I am looking to filter out contaminating mouse reads from the human reads in these datasets.

I have used Xenome before but wanted to try bbsplit. Xenome and bbsplit were attractive because they can handle the fastq files and there is no need to align to mouse and human and then compare filter those bams with tools like ngs-disambiguate, XenofilteR and etc

I built an index for bbsplit successfuly for using human and mouse genomes:

bbsplit.sh -Xmx40g build=1 path=/home/ryan/Reference/bbsplit_mm10_hg38 ref_Mouse=/home/ryan/Reference/Mus_musculus/Ensembl/STAR_reference/Mus_musculus.GRCm38.dna.primary_assembly.fa ref_Human=/home/ryan/Reference/Homo_sapiens/Ensembl/gencode_GRCH38/GRCh38.primary_assembly.genome.fa

I then ran bbsplit as such:

bbsplit.sh -Xmx40g path=/home/ryan/Reference/bbsplit_mm10_hg38/ build=1 in=/home/ryan/NGS_Data/JCA108_S9_L004_R1_001.fastq.gz in2=/home/ryan/NGS_Data/JCA108_S9_L004_R2_001.fastq.gz refstats=/home/ryan/NGS_Data//test/JCA108_stats.txt basename=/home/ryan/NGS_Data/test/JCA108_%_#.fq.gz

I am running this on a Linux system with 48G RAM and 8 threads and the process is taking a long time (over>24hrs so far). Do I need a lot more RAM to use it? the output file is growing, but very slowly!

Thanks!

ryan

next-gen-sequencing RNA-Seq WES ATAC-seq • 1.8k views
ADD COMMENT
0
Entering edit mode

Possibly. Since you are using both human and mouse genomes it is possible. If you have access to better hardware I suggest you move the analysis there.

ADD REPLY
1
Entering edit mode

Thanks @GenoMax! do you think something with at least 70GB of RAM?

ADD REPLY
0
Entering edit mode

If the process is taking a while (but working) then you can just let it finish. It can take a while depending on size of your input data. One way to speed it up would be to have more RAM and add more cores. But that would mean starting over on a new machine.

ADD REPLY
0
Entering edit mode
9 months ago
Kermit ▴ 90

Profiling bbmap/bbsplit.sh

Here is my experience with a paired-end sample (2 x 3GB), 2 species, and 64GB RAM:

  • Succeeded = Ran with -Xmx56g threads=2 and swapfile 10GB overnight for 10.25 hours. I did not see if/ by how much it dipped into swap.

  • Succeeded = Ran with -Xmx56g threads=4 and swapfile 10GB overnight. After 1 hour this had dipped into 2.5 GB of swap. It was writing the graft output file at 2x speed compared to 2 threads.

  • Failed = Ran with -Xmx60g threads=4 and swapfile 2GB. Spilled into swap after 1hour, failed at the 1.5 hour mark during what I believe was the final stage of writing to disk. Maybe it was a fluke or maybe there was a link between threads and memory consumption (perhaps due to some kind of reduce step at the end).

  • Failed = Ran with -Xmx60g threads=8 and swapfile 10GB. Immediately maxed out all RAM and swap.

  • I'll update with 128GB RAM in a few days


 

Handy snippets for increasing swap size = https://askubuntu.com/a/927870/849369

Note that I was using NVMe 5500MB/s storage so reading swap from disk is 11x faster than normal 500MB/s SSD and 50x faster than a 7200 RPM 100MB/s HDD, but still 4x slower than DDR4 RAM

ADD COMMENT
1
Entering edit mode

I've tried both and I honestly prefer Xenome (as much as I appreciate BBTools in general) - for this use case, BBSplit takes more RAM as well as time whereas Xenome is relatively fast and takes less RAM. The only advantage with BBSplit is that it can read and write compressed files. A properly functioning Xenome binary can read compressed files but only writes out FASTQs, so temporary storage requirements can balloon.

ADD REPLY
0
Entering edit mode

Thanks for the suggestion. Where do you install it from? I see a github (https://github.com/data61/gossamer/tree/master) and docker image (https://hub.docker.com/r/mgibio/gossamer) for gossamer suite. Neither has been updated in 6+ years though.

ADD REPLY
1
Entering edit mode

Yes, the data61 gossamer is old and broken, actually. There is a version that used to be on one of the GitHub issues (https://github.com/data61/gossamer/issues/9#issuecomment-402958742) that the dev has moved to a product called XenoCell (https://gitlab.com/XenoCell/) that I am kinda-sorta maintaining now. You can pull the XenoCell docker image and run xenome using that.

Another option is cancerit's fix, which is here: https://github.com/cancerit/gossamer/ You can create a docker image or just download, install and run it.

ADD REPLY

Login before adding your answer.

Traffic: 1798 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6