Question: Why is bwa terminating alignment and exiting the script without an error message?
0
gravatar for mmats010
2.9 years ago by
mmats01060
mmats01060 wrote:

I am running a script through our school's cluster to trim and align some fastq files. The script itself, though a little unwieldy, has worked as expected on some files.

sickle pe -f 20-03_1_sequence.txt.gz -r 20-03_2_sequence.txt.gz -t sanger -o 20-03_F.fq.gz -p 20-03_R.fq.gz -s 20-03.trimmed_singles.fastq -x -g -l 125;
bwa mem -M -t 16 $RefSeq 20-03_F.fq.gz 20-03_R.fq.gz | samtools view -Sbh - | samtools sort -> 20-03_initial.bam;

Here, I trimmed the input 150x2 pe files "20-03_1_sequence.txt.gz" and "20-03_2_sequence.txt.gz" and outputted them as "20-03_F/R.fq.gz", when you can see incorporated into the bwa mem line just below. I fastQC-ed the before and after files, and the following processing steps after the "20-03_initial.bam" is made proceed as expected.

However, when I try this exact same script with another set of files, bwa simply exists after an hour or so.

sickle pe -f 1306_MKcombined_pair1.fastq.gz -r 1306_MKcombined_pair2.fastq.gz -t sanger -o 1306_MK_trimmed_F.fq.gz -p 1306_MK_trimmed_R.fq.gz -s 1306_MK_trimmed_singles.fastq -x -g -l 30;

These files are 50x2 pe and similarly successfully underwent fastQC. However, these two input files "1306_MKcombined_pair1/2.fastq.gz" are actually concatenated fastq files from 4 different sequencing lanes, combined into the final two pairs.

Now using the same command

bwa mem -M -t 32 $RefSeq 1306_MK_trimmed_F.fq.gz 1306_MK_trimmed_R.fq.gz | samtools view -Sbh > 1306_initialA.bam;

bwa exits without giving me any error message in the error file (it simply just stops processing)

....
[M::mem_pestat] (25, 50, 75) percentile: (317, 1056, 3028)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 8450)
[M::mem_pestat] mean and std.dev: (1793.22, 1875.27)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 11161)
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RF
[M::mem_pestat] skip orientation RR

the script exits, after an hour or so (it seems random), I'm left with a .bam file about half the size it should be, and none of the processing steps after the bwa mem command occur. Breaking up the alignment step into two steps (generating a .sam file, then samtools view to generate the .bam file) halts the script halfway through generating the .sam file.

Running the script directly from the command line gives the same result, an incomplete .bam/.sam file.

Sorry for such a long description, but would anyone know why this is occurring and why bwa is not giving me any indication of what is going wrong with the alignment?

sequencing bwa alignment genome • 1.3k views
ADD COMMENTlink modified 2.0 years ago by agathejouet0 • written 2.9 years ago by mmats01060

Hi,

I am having the same problem here. Anyone with a clue of what is happening? I am using the following command:

source bwa-0.7.15; 
source samtools-1.3.1;
bwa mem -t 4 ref_fasta read1 read2 | 
samtools view -hbS - |
samtools sort - -o outbam.sorted

It has worked fine for all my samples but two. However, I do not see any differences in the read files which is the only think I could think of for why these samples would fail.

The log files look like this, an unterminated bwa mem job:

[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 160000 sequences (40000000 bp)...
[M::process] read 160000 sequences (40000000 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (17, 25380, 12, 17)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (116, 155, 345)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 803)
[M::mem_pestat] mean and std.dev: (222.94, 180.96)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 1032)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (457, 533, 632)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (107, 982)
[M::mem_pestat] mean and std.dev: (538.15, 152.69)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 1157)
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25, 50, 75) percentile: (221, 734, 1953)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 5417)
[M::mem_pestat] mean and std.dev: (1222.83, 1339.27)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 7149)
[M::mem_pestat] analyzing insert size distribution for orientation RR...
[M::mem_pestat] (25, 50, 75) percentile: (155, 155, 388)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 854)
[M::mem_pestat] mean and std.dev: (229.71, 134.10)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 1087)
[M::mem_pestat] skip orientation RF
[M::mem_pestat] skip orientation RR

Thanks very much in advance for your help!

Agathe

EDIT: After some fiddling around, it seems that SLURM (to which I am submitting the command) was the culprit. There are different queues I can use to submit jobs and just changing that overcome the problem.

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by agathejouet0

Agathe, did you try running it twice on the same queue? Some of the NBI nodes were behaving strangely recently and if the problem can be pin-pointed to a node, then CIS will be happy if you let them know which node it is.

Also, you may want to consider using samtools view -buSh, so it doesn't compress the .bam data before passing them to samtools sort.

ADD REPLYlink written 2.0 years ago by cschu1811.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1537 users visited in the last hour