Entering edit mode
6.6 years ago
lindsay.liang
▴
20
Hi, I'm running bwa mem (v. 0.7.15) on some whole exome sequencing fastqs (paired end, illumina) and I'm getting a segmentation fault very early on in the run. Here's the command:
bwa mem -t 8 -M -R "@RG\tID:D658\tPL:ILLUMINA\tSM:D658" localDir/human_g1k_v37.fasta localDir/D658_S6_L001_R1_001.fastq.gz localDir/D658_S6_L001_R2_001.fastq.gz
Here's the last part of the output:
@SQ SN:GL000200.1 LN:187035
@SQ SN:GL000193.1 LN:189789
@SQ SN:GL000194.1 LN:191469
@SQ SN:GL000225.1 LN:211173
@SQ SN:GL000192.1 LN:547496
@RG ID:D658 PL:ILLUMINA SM:D658
@PG ID:bwa PN:bwa VN:0.7.15-r1140 CL:bwa mem -t 8 -M -R @RG\tID:D658\tPL:ILLUMINA\tSM:D658 localDir/human_g1k_v37.fasta localDir/D658_S6_L001_R1_001.fastq.gz localDir/D658_S6_L001_R2_001.fastq.gz
[M::process] read 800000 sequences (80000000 bp)...
Segmentation fault
I'm actually running this on an AWS ec2 instance m4.2xlarge, so there's 8 vCPUs and 32Gb of memory. So I don't think a lack of resources is a problem.
Any feedback would be much appreciated!
Hey Lindsay, segmentation faults are indeed usually related to memory or disk space, as you've implied. I believe that the standard space available on EC2 is 8GB - have you used pretty much all of that? Low disk space would be an issue too, which would provoke a segmentation fault.
The other thing that I'd check is to ensure that you have indexed the 1000 Genomes reference FASTA with the same version of bwa that you are using for alignment.
Edit: if you also recently upgrade the RAM on your EC2 instance, it may take a few hours to optimise and for this extra RAM to be available.
Thanks for your response Kevin! When I first spun up the instance I added 20G of storage to it, so again I don't think that memory is the problem. My reference file was also indexed with 0.7.15 so that's not the issue either.
Okay, and are you sure that the FASTQ files are correctly formatted? Have you tried to even start the run on another computer? How much RAM appears available when you run the
top
command in BASH (look for 'KiB Mem' - exittop
by pressing q)?Fastq's look fine from first glance - R1 and R2 have the same number of lines (so there's an even number of reads), and just be using
less
they look properly formatted. I've spun up different instances and tried the run other times and there's no difference.The mem line on top says
Mem: 32949384k total, 23330372k used, 9619012k free, 131032k buffers
.What's using up 23.3 gigabytes of your RAM?!
Ah. Sorry, I'm indexing the reference (again) in the background just to make sure that wasn't the problem.
Ok, I started a new instance after the reindexing was done. Top's memory line now looks like this:
KiB Mem : 32949384 total, 19087292 free, 126100 used, 13735992 buff/cache
.Also, I'm no longer getting a segfault error, but instead this:
This gets stranger by the minute! The same error was observed here: https://github.com/lh3/bwa/issues/120 Did you index from a compressed FASTA file?
Heng Li, developer of BWA, even wrote into the code for the program that "assertion failure should never happen" (see line 444 https://github.com/lh3/bwa/blob/master/bntseq.c#L444)
I just saw that! But alas no, my reference wasn't indexed (the command I used was just
bwa index -a bwtsw human_g1k_v37.fasta
).The final few things that I an suggest are:
So it turns out that one of my index files got corrupted by a transfer from ec2 to aws s3 (and back again), so after I reindexed everything (again) and transferred the files (again) everything seems to be working fine. :|
(Just thought I'd post this for closure).
You deserve the up vote for that! :)