Question: Bwa: How To Get Multiple Hits Using The Algorithm Bwasw?
2
7.8 years ago by
Cristina40
Cristina40 wrote:

Hi,

I am working with 454 data.

I am using BWA to mapping the reads onto the assembled contigs and I want to known all the possibilities.

I test that with samse and -n option is reliable if your input are short reads but not work with long reads (bwasw algorithm), Does someone known How can obtain multiple Hits (tag XA) using the bwasw algorithm?

``````bwa index -a is reference.fasta

bwa aln -N reference.fasta reads.fastq > aln_sa.sai

bwa samse -n 10 reference.fasta aln_sa.sai reads.fastq > aln.sam

less aln.sam

@SQ SN:gen_1 LN:910
@SQ SN:gen_2 LN:910
@SQ SN:gen_3 LN:910
@PG ID:bwa PN:bwa VN:0.5.9-r16
BCM10 0 gen_1 1 17 51M * 0 0
TTCATTACTTATTTGTAAAGCTTCTCTTTACAAAAGTCTCTTGTATTTTCC
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIAIIIB5II+.9I\$\$ XT:A:U NM:i:0 X0:i:1
X1:i:4 XM:i:0 XO:i:0 XG:i:0 MD:Z:51
XA:Z:gen_2,+1,51M,2;gen_3,+1,51M,2;gen_1,+1,50M1S,2;gen_1,+1,51M,3;

BCM11 0 gen_1 71 20 51M * 0 0
GAGCAGTATTTTAAAATTCTGCCAATGGAGCTTAAGAGTGTTTATATCCGC
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII<IIII6III8@ XT:A:U NM:i:0 X0:i:1
X1:i:2 XM:i:0 XO:i:0 XG:i:0 MD:Z:51
XA:Z:gen_3,+71,51M,2;gen_2,+71,51M,2;

BCM14 0 gen_1 351 0 51M * 0 0
GCAATTGCATCATTGACCACAACAACAGATATGAACCTAGCGGCTGCCCAA
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIAIIIB5II+.9I\$\$ XT:A:R NM:i:0 X0:i:3
X1:i:6 XM:i:0 XO:i:0 XG:i:0 MD:Z:51
XA:Z:gen_3,+351,51M,0;gen_2,+351,51M,0;gen_1,+351,51M,3;gen_3,+351,51M,3;gen_2,+351,51M,3;gen_1,+351,50M1S,3;gen_3,+351,50M1S,3;gen_2,+351,50M1S,3;
``````
multiple bwa • 3.8k views
modified 7.8 years ago by Michael Dondrup47k • written 7.8 years ago by Cristina40
1
7.8 years ago by
Bergen, Norway
Michael Dondrup47k wrote:

From the documentation:

-z INT Z-best heuristics. Higher -z increases accuracy at the cost of speed. 1

According to this, setting `-z` to > 1 will give n best hits, given the alignments are present and satisfy the score threshold. I haven't tested this, but it would be very easy to find out if it works. Simply make a test reference containing a single read sequence twice and align this sequence to the test reference.

I think this yields multiple lines in the sam file, but I don't know if the tag will be used correctly.

Edit: Tested this but `-z` doesn't seem to have any effect.

Here's my input query:

``````>testseq
AGCCACTCTCATTCTTAATCGTCCACCGAAGCTTCTAATTAAGAATCTCGTTCGACTTGCATGTCTTAACCACGCCGCCA
``````

reference:

``````>refdummyseq1
AGCCACTCTCATTCTTAATCGTCCACCGAAGCTTCTAATTAAGAATCTCGTTCGACTTGCATGTCTTAACCACGCCGCCAGCGTTCACTCTGAACCAGAGCCACTCTCATTCTTAATCGTCCACCGAAGCTTCTAATTAAGAATCTCGTTCGACTTGCATGTCTTAACCACGCCGCCAGCGTTCACTCTGAACCAGAGCCACTCTCATTCTTAATCGTCCACCGAAGCTTCTAATTAAGAATCTCGTTCGACTTGCATGTCTTAACCACGCCGCCAGCGTTCACTCTGAACCAGAGCCACTCTCATTCTTAATCGTCCACCGAAGCTTCTAATTAAGAATCTCGTTCGACTTGCATGTCTTAACCACGCCGCCAGCGTTCACTCTGAACCAG
>refdummyseq2
AGCCACTCTCATTCTTAATCGTCCACCGAAGCTTCTAATTAAGAATCTCGTTCGACTTGCATGTCTTAACCACGCCGCCAGCGTTCACTCTGAACCAGAGCCACTCTCATTCTTAATCGTCCACCGAAGCTTCTAATTAAGAATCTCGTTCGACTTGCATGTCTTAACCACGCCGCCAGCGTTCACTCTGAACCAGAGCCACTCTCATTCTTAATCGTCCACCGAAGCTTCTAATTAAGAATCTCGTTCGACTTGCATGTCTTAACCACGCCGCCAGCGTTCACTCTGAACCAGAGCCACTCTCATTCTTAATCGTCCACCGAAGCTTCTAATTAAGAATCTCGTTCGACTTGCATGTCTTAACCACGCCGCCAGCGTTCACTCTGAACCAG
``````

So the reference contains multiple copies of the query, but whatever I do with `-z` it always looks the same:

``````\$ bwa bwasw -z 999 test.fna test.fas
@SQ SN:refdummyseq1 LN:392
@SQ SN:refdummyseq2 LN:392
[bsw2_aln] read 1 sequences/pairs (80 bp)...
testseq 0   refdummyseq1    1   0   80M *   0   0   AGCCACTCTCATTCTTAATCGTCCACCGAAGCTTCTAATTAAGAATCTCGTTCGACTTGCATGTCTTAACCACGCCGCCAAS:i:80 XS:i:80 XF:i:3  XE:i:0  NM:i:0
[main] Version: 0.6.1-r104
[main] CMD: bwa bwasw -z 999 test.fna test.fas
[main] Real time: 0.003 sec; CPU: 0.005 sec
``````

Conclusion: you might have to use a different aligner to do this would blat work for you, or do you have too many reads?

I create two identical reference sequences, I want to know If a read can mapped onto different references.

When I set z = 1 the aligment print one sequence (the first in the file) and when I change the value of z = 5 the alignment print the other sequence (the second in the file).

So, I have to make 2 different alignment if I want to retrieve the alternative Hits because don't appear in the same sam file.

Question: The two sequences are identical, Why with z=1 retrieve the first reference sequence and with the z=5 retrieve the second reference sequence?

Now I tried it with a simple test file and didn't seem to have any effect!