Question

bwa mem segfault

0

Entering edit mode

6.6 years ago

lindsay.liang ▴ 20

Hi, I'm running bwa mem (v. 0.7.15) on some whole exome sequencing fastqs (paired end, illumina) and I'm getting a segmentation fault very early on in the run. Here's the command:

bwa mem -t 8 -M -R "@RG\tID:D658\tPL:ILLUMINA\tSM:D658" localDir/human_g1k_v37.fasta localDir/D658_S6_L001_R1_001.fastq.gz localDir/D658_S6_L001_R2_001.fastq.gz

Here's the last part of the output:

@SQ SN:GL000200.1   LN:187035
@SQ SN:GL000193.1   LN:189789
@SQ SN:GL000194.1   LN:191469
@SQ SN:GL000225.1   LN:211173
@SQ SN:GL000192.1   LN:547496
@RG ID:D658 PL:ILLUMINA SM:D658
@PG ID:bwa  PN:bwa  VN:0.7.15-r1140 CL:bwa mem -t 8 -M -R @RG\tID:D658\tPL:ILLUMINA\tSM:D658 localDir/human_g1k_v37.fasta localDir/D658_S6_L001_R1_001.fastq.gz localDir/D658_S6_L001_R2_001.fastq.gz
[M::process] read 800000 sequences (80000000 bp)...
Segmentation fault

I'm actually running this on an AWS ec2 instance m4.2xlarge, so there's 8 vCPUs and 32Gb of memory. So I don't think a lack of resources is a problem.

Any feedback would be much appreciated!

bwa mem segfault bwa • 3.9k views

ADD COMMENT • link 6.6 years ago by lindsay.liang ▴ 20

0

Entering edit mode

Hey Lindsay, segmentation faults are indeed usually related to memory or disk space, as you've implied. I believe that the standard space available on EC2 is 8GB - have you used pretty much all of that? Low disk space would be an issue too, which would provoke a segmentation fault.

The other thing that I'd check is to ensure that you have indexed the 1000 Genomes reference FASTA with the same version of bwa that you are using for alignment.

Edit: if you also recently upgrade the RAM on your EC2 instance, it may take a few hours to optimise and for this extra RAM to be available.

ADD REPLY • link 6.6 years ago by Kevin Blighe 87k

0

Entering edit mode

Thanks for your response Kevin! When I first spun up the instance I added 20G of storage to it, so again I don't think that memory is the problem. My reference file was also indexed with 0.7.15 so that's not the issue either.

ADD REPLY • link 6.6 years ago by lindsay.liang ▴ 20

1

Entering edit mode

Okay, and are you sure that the FASTQ files are correctly formatted? Have you tried to even start the run on another computer? How much RAM appears available when you run the top command in BASH (look for 'KiB Mem' - exit top by pressing q)?

ADD REPLY • link 6.6 years ago by Kevin Blighe 87k

0

Entering edit mode

Fastq's look fine from first glance - R1 and R2 have the same number of lines (so there's an even number of reads), and just be using less they look properly formatted. I've spun up different instances and tried the run other times and there's no difference.

The mem line on top says Mem: 32949384k total, 23330372k used, 9619012k free, 131032k buffers.

ADD REPLY • link 6.6 years ago by lindsay.liang ▴ 20

0

Entering edit mode

What's using up 23.3 gigabytes of your RAM?!

ADD REPLY • link 6.6 years ago by Kevin Blighe 87k

0

Entering edit mode

Ah. Sorry, I'm indexing the reference (again) in the background just to make sure that wasn't the problem.

Ok, I started a new instance after the reindexing was done. Top's memory line now looks like this: KiB Mem : 32949384 total, 19087292 free, 126100 used, 13735992 buff/cache.

Also, I'm no longer getting a segfault error, but instead this:

[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 800000 sequences (80000000 bp)...
[E::bns_fetch_seq] begin=3442128365, mid=3442128366, end=3101804739, len=340323626, seq=0x7f9557b71010, rid=83, far_beg=3101257243, far_end=3101804739
bwa: bntseq.c:444: bns_fetch_seq: Assertion `seq && *end - *beg == len' failed.
Aborted

ADD REPLY • link 6.6 years ago by lindsay.liang ▴ 20

0

Entering edit mode

This gets stranger by the minute! The same error was observed here: https://github.com/lh3/bwa/issues/120 Did you index from a compressed FASTA file?

Heng Li, developer of BWA, even wrote into the code for the program that "assertion failure should never happen" (see line 444 https://github.com/lh3/bwa/blob/master/bntseq.c#L444)

ADD REPLY • link 6.6 years ago by Kevin Blighe 87k

0

Entering edit mode

I just saw that! But alas no, my reference wasn't indexed (the command I used was just bwa index -a bwtsw human_g1k_v37.fasta).

ADD REPLY • link 6.6 years ago by lindsay.liang ▴ 20

0

Entering edit mode

The final few things that I an suggest are:

Use an older version of bwa
Try running it on your local machine (even if it's not sufficiently powered, just see if the alignment begins)
Ensure that all of your C libraries are installed (although this shouldn't be an issue if you downloaded the binary executable of bwa)

ADD REPLY • link 6.6 years ago by Kevin Blighe 87k

2

Entering edit mode

So it turns out that one of my index files got corrupted by a transfer from ec2 to aws s3 (and back again), so after I reindexed everything (again) and transferred the files (again) everything seems to be working fine. :|

(Just thought I'd post this for closure).

ADD REPLY • link 6.6 years ago by lindsay.liang ▴ 20

0

Entering edit mode

You deserve the up vote for that! :)

ADD REPLY • link 6.6 years ago by Kevin Blighe 87k