Bwa: How To Get Multiple Hits Using The Algorithm Bwasw?
1
2
Entering edit mode
12.2 years ago
Cristina ▴ 40

Hi,

I am working with 454 data.

I am using BWA to mapping the reads onto the assembled contigs and I want to known all the possibilities.

I test that with samse and -n option is reliable if your input are short reads but not work with long reads (bwasw algorithm), Does someone known How can obtain multiple Hits (tag XA) using the bwasw algorithm?

EX: Test with short reads

bwa index -a is reference.fasta

bwa aln -N reference.fasta reads.fastq > aln_sa.sai

bwa samse -n 10 reference.fasta aln_sa.sai reads.fastq > aln.sam

less aln.sam


@SQ SN:gen_1 LN:910
@SQ SN:gen_2 LN:910
@SQ SN:gen_3 LN:910
@PG ID:bwa PN:bwa VN:0.5.9-r16
BCM10 0 gen_1 1 17 51M * 0 0
TTCATTACTTATTTGTAAAGCTTCTCTTTACAAAAGTCTCTTGTATTTTCC
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIAIIIB5II+.9I$$ XT:A:U NM:i:0 X0:i:1
X1:i:4 XM:i:0 XO:i:0 XG:i:0 MD:Z:51
XA:Z:gen_2,+1,51M,2;gen_3,+1,51M,2;gen_1,+1,50M1S,2;gen_1,+1,51M,3;

BCM11 0 gen_1 71 20 51M * 0 0
GAGCAGTATTTTAAAATTCTGCCAATGGAGCTTAAGAGTGTTTATATCCGC
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII<IIII6III8@ XT:A:U NM:i:0 X0:i:1
X1:i:2 XM:i:0 XO:i:0 XG:i:0 MD:Z:51
XA:Z:gen_3,+71,51M,2;gen_2,+71,51M,2;

BCM14 0 gen_1 351 0 51M * 0 0
GCAATTGCATCATTGACCACAACAACAGATATGAACCTAGCGGCTGCCCAA
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIAIIIB5II+.9I$$ XT:A:R NM:i:0 X0:i:3
X1:i:6 XM:i:0 XO:i:0 XG:i:0 MD:Z:51
XA:Z:gen_3,+351,51M,0;gen_2,+351,51M,0;gen_1,+351,51M,3;gen_3,+351,51M,3;gen_2,+351,51M,3;gen_1,+351,50M1S,3;gen_3,+351,50M1S,3;gen_2,+351,50M1S,3;
bwa multiple • 5.2k views
ADD COMMENT
1
Entering edit mode
12.2 years ago
Michael 54k

From the documentation:

-z INT Z-best heuristics. Higher -z increases accuracy at the cost of speed. 1

According to this, setting -z to > 1 will give n best hits, given the alignments are present and satisfy the score threshold. I haven't tested this, but it would be very easy to find out if it works. Simply make a test reference containing a single read sequence twice and align this sequence to the test reference.

I think this yields multiple lines in the sam file, but I don't know if the tag will be used correctly.

Edit: Tested this but -z doesn't seem to have any effect.

Here's my input query:

>testseq
AGCCACTCTCATTCTTAATCGTCCACCGAAGCTTCTAATTAAGAATCTCGTTCGACTTGCATGTCTTAACCACGCCGCCA

reference:

>refdummyseq1
AGCCACTCTCATTCTTAATCGTCCACCGAAGCTTCTAATTAAGAATCTCGTTCGACTTGCATGTCTTAACCACGCCGCCAGCGTTCACTCTGAACCAGAGCCACTCTCATTCTTAATCGTCCACCGAAGCTTCTAATTAAGAATCTCGTTCGACTTGCATGTCTTAACCACGCCGCCAGCGTTCACTCTGAACCAGAGCCACTCTCATTCTTAATCGTCCACCGAAGCTTCTAATTAAGAATCTCGTTCGACTTGCATGTCTTAACCACGCCGCCAGCGTTCACTCTGAACCAGAGCCACTCTCATTCTTAATCGTCCACCGAAGCTTCTAATTAAGAATCTCGTTCGACTTGCATGTCTTAACCACGCCGCCAGCGTTCACTCTGAACCAG
>refdummyseq2
AGCCACTCTCATTCTTAATCGTCCACCGAAGCTTCTAATTAAGAATCTCGTTCGACTTGCATGTCTTAACCACGCCGCCAGCGTTCACTCTGAACCAGAGCCACTCTCATTCTTAATCGTCCACCGAAGCTTCTAATTAAGAATCTCGTTCGACTTGCATGTCTTAACCACGCCGCCAGCGTTCACTCTGAACCAGAGCCACTCTCATTCTTAATCGTCCACCGAAGCTTCTAATTAAGAATCTCGTTCGACTTGCATGTCTTAACCACGCCGCCAGCGTTCACTCTGAACCAGAGCCACTCTCATTCTTAATCGTCCACCGAAGCTTCTAATTAAGAATCTCGTTCGACTTGCATGTCTTAACCACGCCGCCAGCGTTCACTCTGAACCAG

So the reference contains multiple copies of the query, but whatever I do with -z it always looks the same:

$ bwa bwasw -z 999 test.fna test.fas 
@SQ SN:refdummyseq1 LN:392
@SQ SN:refdummyseq2 LN:392
[bsw2_aln] read 1 sequences/pairs (80 bp)...
testseq 0   refdummyseq1    1   0   80M *   0   0   AGCCACTCTCATTCTTAATCGTCCACCGAAGCTTCTAATTAAGAATCTCGTTCGACTTGCATGTCTTAACCACGCCGCCAAS:i:80 XS:i:80 XF:i:3  XE:i:0  NM:i:0
[main] Version: 0.6.1-r104
[main] CMD: bwa bwasw -z 999 test.fna test.fas
[main] Real time: 0.003 sec; CPU: 0.005 sec

Conclusion: you might have to use a different aligner to do this would blat work for you, or do you have too many reads?

ADD COMMENT
0
Entering edit mode

Thanks for your reply Michael.

I create two identical reference sequences, I want to know If a read can mapped onto different references.

When I set z = 1 the aligment print one sequence (the first in the file) and when I change the value of z = 5 the alignment print the other sequence (the second in the file).

So, I have to make 2 different alignment if I want to retrieve the alternative Hits because don't appear in the same sam file.

Question: The two sequences are identical, Why with z=1 retrieve the first reference sequence and with the z=5 retrieve the second reference sequence?

ADD REPLY
0
Entering edit mode

Now I tried it with a simple test file and didn't seem to have any effect!

ADD REPLY
0
Entering edit mode

Ok, Thanks Michael

ADD REPLY

Login before adding your answer.

Traffic: 2131 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6