Question

BBMap tool error - randomreads.sh

1

Entering edit mode

3.4 years ago

GLR ▴ 20

Hello,

I'm trying to generate to simulated PE reads from the wheat reference genome using BBTools randomreads.sh and I keep receiving the following error message within a few seconds:

Exception in thread "main" java.lang.ArithmeticException: / by zero
        at align2.RandomReads3.fillRandomChrom(RandomReads3.java:1823)
        at align2.RandomReads3.<init>(RandomReads3.java:606)
        at align2.RandomReads3.<init>(RandomReads3.java:599)
        at align2.RandomReads3.main(RandomReads3.java:409)

The commands I am using are:

randomreads.sh ref=refgenome.fa.gz out1=gen_R1.fq out2=gen_R2.fq length=100 reads=2000 paired=t

Interestingly, the error appears to only affect the wheat genome. I've tried with both the full reference and several subgenomic sequences and receive the same error. When I try the same commands with tobacco and coffee genomes the program works beautifully. I've inspected the head of the reference genome files from all genomes and I see no differences in their format.

Can anyone provide any insight into this interesting error?

software error sequence simulatedseq bbmap • 2.2k views

ADD COMMENT • link updated 3.4 years ago by Ram 43k • written 3.4 years ago by GLR ▴ 20

1

Entering edit mode

Could the chromosomes be too long?

ADD REPLY • link 3.4 years ago by WouterDeCoster 47k

0

Entering edit mode

Ooh, excellent point. I'll go ahead and check the differences in sequence lengths between the files and get back to you. Any idea why long chromosomes might cause this error (just out of curiosity).

ADD REPLY • link 3.4 years ago by GLR ▴ 20

0

Entering edit mode

Regarding the sequence lengths, it appears you could be on to something. Both coffee's and tobacco's reference genomes' contigs are far smaller with tobaccos being on the scale of a few hundred thousand, coffees being on the scale of a few tens of millions and wheats being on the scale of hundreds of millions.

ADD REPLY • link 3.4 years ago by GLR ▴ 20

2

Entering edit mode

Allocate more memory to randomreads.sh with option -Xmx40gand see if that fixes this. You may need to add tens of gigabytes depending on size of genome but I would say start with 30-40G.

ADD REPLY • link 3.4 years ago by GenoMax 141k

0

Entering edit mode

Thank you, I'll give this a go. I'm using a shared system so I might have to queue a while, but I will get back to you with the result.

ADD REPLY • link 3.4 years ago by GLR ▴ 20

0

Entering edit mode

Hi GenoMax, apologies for the delay, I performed a few iterations with increasing memory allocation. Unfortunately I still get the error within seconds of the script running despite allocating up to 120Gb. Do you have any other suggestions?

ADD REPLY • link 3.4 years ago by GLR ▴ 20

0

Entering edit mode

Where did you get your wheat genomes from? I may download a copy and give this a try.

ADD REPLY • link 3.4 years ago by GenoMax 141k

0

Entering edit mode

I got them from ensemble. I'd love to hear what happens if you do get the chance.

ADD REPLY • link 3.4 years ago by GLR ▴ 20

0

Entering edit mode

I can confirm that I get the same error. I tried to build an index with bbmap.sh but that also generated a different error even after assigning 100G of RAM.

I have written to @Brian (author of BBMap). If he responds, I will post an update. He may respond directly here as well.

ADD REPLY • link 3.4 years ago by GenoMax 141k

0

Entering edit mode

Thank you so much GenoMax!

ADD REPLY • link 3.4 years ago by GLR ▴ 20

0

Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.
code_formatting

Note: Blockquotes are used to quote text, not show commands/code.

ADD REPLY • link 3.4 years ago by Ram 43k

0

Entering edit mode

Thank you, will do in future.

ADD REPLY • link 3.4 years ago by GLR ▴ 20

0

Entering edit mode

install seqkit the run:

 seqkit  stats refgenome.fa.gz

to get a sense of what the layout of your reference is across the chromosomes.

I suspect that some chromosomes might be too short and the random selection cannot operate correctly.

ADD REPLY • link 3.4 years ago by Istvan Albert 100k

0

Entering edit mode

Thank you, I'll run it for all the genomes and see what I get but strangely it runs beautifully on the genomes with short, yet highly abundant contigs. The tobacco genome I use has 168247 relatively short contigs and the program doesn't produce the error but it does for wheats 22 relatively long contigs, and for wheats' well-assembled 1A chromosome.

ADD REPLY • link 3.4 years ago by GLR ▴ 20

score 3 · Answer 1 · 2020-12-14

3

Entering edit mode

3.4 years ago

Brian Bushnell 20k

Hi! I'm sorry about this, but BBMap (and thus RandomReads, which uses the same index) does not support chromosomes over 500Mbp in length. Wheat is the only genome I know of that has such a long chromosome. If you break it up (e.g. with shred.sh shredding into lengths of under 500Mbp max) then it will work correctly.

ADD COMMENT • link 3.4 years ago by Brian Bushnell 20k

0

Entering edit mode

Brilliant, I'll shred it first then. Thank you so much!

ADD REPLY • link 3.4 years ago by GLR ▴ 20

score 1 · Answer 2 · 2020-12-10

1

Entering edit mode

3.4 years ago

ATpoint 82k

If more memory is no option because you are limited on RAM, maybe just splitting the chromosomes in like "half" chr1_1/chr1_2? I mean the split junction would not be properly covered but it would be neglectable I guess, just in case there is no way to get more RAM.

Edit: See the developer's response ( Brian Bushnell ), splitting chromosomes seems the only option here.

ADD COMMENT • link 3.4 years ago by ATpoint 82k

0

Entering edit mode

Great suggestion, thank you. I'll use this if the queue for the larger memory nodes are too long on the shared system I use.

ADD REPLY • link 3.4 years ago by GLR ▴ 20