bwa aln calculate SA coordinate issue
1
0
Entering edit mode
9.2 years ago
Crystal ▴ 70

Hi All,

I'm using bwa to align my metagenomic data to one bovine DNA sequence.I index bovine DNA sequence and run

bwa aln index_file input.fastq >output.fastq

It do gave me the sai file, but it took really really really long time to finish the whole process (~8 hours per fastq file).

I used bwa aln for several times, it never took this long.

This is what I saw:

[bwa_aln] 17bp reads: max_diff = 2
[bwa_aln] 38bp reads: max_diff = 3
[bwa_aln] 64bp reads: max_diff = 4
[bwa_aln] 93bp reads: max_diff = 5
[bwa_aln] 124bp reads: max_diff = 6
[bwa_aln] 157bp reads: max_diff = 7
[bwa_aln] 190bp reads: max_diff = 8
[bwa_aln] 225bp reads: max_diff = 9
[bwa_aln_core] calculate SA coordinate... 310.66 sec
[bwa_aln_core] write to the disk... 0.06 sec
[bwa_aln_core] 262144 sequences have been processed.
[bwa_aln_core] calculate SA coordinate... 294.22 sec
[bwa_aln_core] write to the disk... 0.05 sec
[bwa_aln_core] 524288 sequences have been processed.
[bwa_aln_core] calculate SA coordinate... 289.02 sec
[bwa_aln_core] write to the disk... 0.05 sec

It seems most time was spending on calculate SA coordinate (~5 minutes per), but in the past in only took ~0.3 sec to calculate SA coordinate.

What issue may be for this process (memory? computer RAM size? internet?)

Thanks

bwa-aln • 4.5k views
ADD COMMENT
0
Entering edit mode

Have you tried using more than one thread?

ADD REPLY
0
Entering edit mode

Hi Devon,

No, I never tried that before because it works pretty fast in the past.

How can I specify this in the code?

Thanks

ADD REPLY
0
Entering edit mode
9.1 years ago
mark.ziemann ★ 1.9k

It might be running slower because you're querying a larger genome.

bwa aln help page says that the -t option allows you to select more than 1 processor.

On a 4 core machine:

bwa aln -t 4 index_file input.fastq >output.fastq

Will run 3-4 times faster.

Usage:   bwa aln [options] <prefix> <in.fq>

Options: -n NUM    max #diff (int) or missing prob under 0.02 err rate (float) [0.04]
         -o INT    maximum number or fraction of gap opens [1]
         -e INT    maximum number of gap extensions, -1 for disabling long gaps [-1]
         -i INT    do not put an indel within INT bp towards the ends [5]
         -d INT    maximum occurrences for extending a long deletion [10]
         -l INT    seed length [32]
         -k INT    maximum differences in the seed [2]
         -m INT    maximum entries in the queue [2000000]
         -t INT    number of threads [1]
         -M INT    mismatch penalty [3]
         -O INT    gap open penalty [11]
         -E INT    gap extension penalty [4]
         -R INT    stop searching when there are >INT equally best hits [30]
         -q INT    quality threshold for read trimming down to 35bp [0]
         -f FILE   file to write output to instead of stdout
         -B INT    length of barcode
         -L        log-scaled gap penalty for long deletions
         -N        non-iterative mode: search for all n-difference hits (slooow)
         -I        the input is in the Illumina 1.3+ FASTQ-like format
         -b        the input read file is in the BAM format
         -0        use single-end reads only (effective with -b)
         -1        use the 1st read in a pair (effective with -b)
         -2        use the 2nd read in a pair (effective with -b)
         -Y        filter Casava-filtered sequences
ADD COMMENT

Login before adding your answer.

Traffic: 3091 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6