Question: Bwa: How To Get Multiple Hits Using The Algorithm Bwasw?
2
gravatar for Cristina
7.1 years ago by
Cristina40
Cristina40 wrote:

Hi,

I am working with 454 data.

I am using BWA to mapping the reads onto the assembled contigs and I want to known all the possibilities.

I test that with samse and -n option is reliable if your input are short reads but not work with long reads (bwasw algorithm), Does someone known How can obtain multiple Hits (tag XA) using the bwasw algorithm?

EX: Test with short reads

bwa index -a is reference.fasta

bwa aln -N reference.fasta reads.fastq > aln_sa.sai

bwa samse -n 10 reference.fasta aln_sa.sai reads.fastq > aln.sam

less aln.sam


@SQ SN:gen_1 LN:910
@SQ SN:gen_2 LN:910
@SQ SN:gen_3 LN:910
@PG ID:bwa PN:bwa VN:0.5.9-r16
BCM10 0 gen_1 1 17 51M * 0 0
TTCATTACTTATTTGTAAAGCTTCTCTTTACAAAAGTCTCTTGTATTTTCC
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIAIIIB5II+.9I$$ XT:A:U NM:i:0 X0:i:1
X1:i:4 XM:i:0 XO:i:0 XG:i:0 MD:Z:51
XA:Z:gen_2,+1,51M,2;gen_3,+1,51M,2;gen_1,+1,50M1S,2;gen_1,+1,51M,3;

BCM11 0 gen_1 71 20 51M * 0 0
GAGCAGTATTTTAAAATTCTGCCAATGGAGCTTAAGAGTGTTTATATCCGC
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII<IIII6III8@ XT:A:U NM:i:0 X0:i:1
X1:i:2 XM:i:0 XO:i:0 XG:i:0 MD:Z:51
XA:Z:gen_3,+71,51M,2;gen_2,+71,51M,2;

BCM14 0 gen_1 351 0 51M * 0 0
GCAATTGCATCATTGACCACAACAACAGATATGAACCTAGCGGCTGCCCAA
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIAIIIB5II+.9I$$ XT:A:R NM:i:0 X0:i:3
X1:i:6 XM:i:0 XO:i:0 XG:i:0 MD:Z:51
XA:Z:gen_3,+351,51M,0;gen_2,+351,51M,0;gen_1,+351,51M,3;gen_3,+351,51M,3;gen_2,+351,51M,3;gen_1,+351,50M1S,3;gen_3,+351,50M1S,3;gen_2,+351,50M1S,3;
multiple bwa • 3.6k views
ADD COMMENTlink modified 7.1 years ago by Michael Dondrup45k • written 7.1 years ago by Cristina40
1
gravatar for Michael Dondrup
7.1 years ago by
Bergen, Norway
Michael Dondrup45k wrote:

From the documentation:

-z INT Z-best heuristics. Higher -z increases accuracy at the cost of speed. 1

According to this, setting -z to > 1 will give n best hits, given the alignments are present and satisfy the score threshold. I haven't tested this, but it would be very easy to find out if it works. Simply make a test reference containing a single read sequence twice and align this sequence to the test reference.

I think this yields multiple lines in the sam file, but I don't know if the tag will be used correctly.

Edit: Tested this but -z doesn't seem to have any effect.

Here's my input query:

  >testseq
  AGCCACTCTCATTCTTAATCGTCCACCGAAGCTTCTAATTAAGAATCTCGTTCGACTTGCATGTCTTAACCACGCCGCCA

reference:

>refdummyseq1
AGCCACTCTCATTCTTAATCGTCCACCGAAGCTTCTAATTAAGAATCTCGTTCGACTTGCATGTCTTAACCACGCCGCCAGCGTTCACTCTGAACCAGAGCCACTCTCATTCTTAATCGTCCACCGAAGCTTCTAATTAAGAATCTCGTTCGACTTGCATGTCTTAACCACGCCGCCAGCGTTCACTCTGAACCAGAGCCACTCTCATTCTTAATCGTCCACCGAAGCTTCTAATTAAGAATCTCGTTCGACTTGCATGTCTTAACCACGCCGCCAGCGTTCACTCTGAACCAGAGCCACTCTCATTCTTAATCGTCCACCGAAGCTTCTAATTAAGAATCTCGTTCGACTTGCATGTCTTAACCACGCCGCCAGCGTTCACTCTGAACCAG
>refdummyseq2
AGCCACTCTCATTCTTAATCGTCCACCGAAGCTTCTAATTAAGAATCTCGTTCGACTTGCATGTCTTAACCACGCCGCCAGCGTTCACTCTGAACCAGAGCCACTCTCATTCTTAATCGTCCACCGAAGCTTCTAATTAAGAATCTCGTTCGACTTGCATGTCTTAACCACGCCGCCAGCGTTCACTCTGAACCAGAGCCACTCTCATTCTTAATCGTCCACCGAAGCTTCTAATTAAGAATCTCGTTCGACTTGCATGTCTTAACCACGCCGCCAGCGTTCACTCTGAACCAGAGCCACTCTCATTCTTAATCGTCCACCGAAGCTTCTAATTAAGAATCTCGTTCGACTTGCATGTCTTAACCACGCCGCCAGCGTTCACTCTGAACCAG

So the reference contains multiple copies of the query, but whatever I do with -z it always looks the same:

$ bwa bwasw -z 999 test.fna test.fas 
@SQ SN:refdummyseq1 LN:392
@SQ SN:refdummyseq2 LN:392
[bsw2_aln] read 1 sequences/pairs (80 bp)...
testseq 0   refdummyseq1    1   0   80M *   0   0   AGCCACTCTCATTCTTAATCGTCCACCGAAGCTTCTAATTAAGAATCTCGTTCGACTTGCATGTCTTAACCACGCCGCCAAS:i:80 XS:i:80 XF:i:3  XE:i:0  NM:i:0
[main] Version: 0.6.1-r104
[main] CMD: bwa bwasw -z 999 test.fna test.fas
[main] Real time: 0.003 sec; CPU: 0.005 sec

Conclusion: you might have to use a different aligner to do this would blat work for you, or do you have too many reads?

ADD COMMENTlink modified 7.1 years ago • written 7.1 years ago by Michael Dondrup45k

Thanks for your reply Michael.

I try to do it.

I create two identical reference sequences. When I set z = 1 the aligment print one sequence (the first in the file) and when I change the value of z = 5 the alignment print the other sequence (the second in the file). I have to make 2 different align if I want to retrieve the alternative Hits because don't appear in the same sam file.

Question

The two sequences are identical, Why with z=1 retrieve the first reference sequence and with the z=5 retrieve the second reference sequence?

Other Question

This approach only work with 2 i

ADD REPLYlink written 7.1 years ago by Cristina40

Thanks for your reply Michael.

I create two identical reference sequences, I want to know If a read can mapped onto different references

When I set z = 1 the aligment print one sequence (the first in the file) and when I change the value of z = 5 the alignment print the other sequence (the second in the file).

So, I have to make 2 different alignment if I want to retrieve the alternative Hits because don't appear in the same sam file.

Question: The two sequences are identical, Why with z=1 retrieve the first reference sequence and with the z=5 retrieve the second reference sequence?

ADD REPLYlink written 7.1 years ago by Cristina40

Thanks for your reply Michael.

I create two identical reference sequences, I want to know If a read can mapped onto different references.

When I set z = 1 the aligment print one sequence (the first in the file) and when I change the value of z = 5 the alignment print the other sequence (the second in the file). So, I have to make 2 different alignment if I want to retrieve the alternative Hits because don't appear in the same sam file.

Question: The two sequences are identical, Why with z=1 retrieve the first reference sequence and with the z=5 retrieve the second reference sequence?

ADD REPLYlink written 7.1 years ago by Cristina40

Thanks for your reply Michael.

ADD REPLYlink written 7.1 years ago by Cristina40

I create two identical reference sequences, I want to know If a read can mapped onto different references.

When I set z = 1 the aligment print one sequence (the first in the file) and when I change the value of z = 5 the alignment print the other sequence (the second in the file).

So, I have to make 2 different alignment if I want to retrieve the alternative Hits because don't appear in the same sam file. #Question: The two sequences are identical, Why with z=1 retrieve the first reference sequence and with the z=5 retrieve the second reference sequence?

ADD REPLYlink written 7.1 years ago by Cristina40

I create two identical reference sequences, I want to know If a read can mapped onto different references. When I set z = 1 the aligment print one sequence (the first in the file) and when I change the value of z = 5 the alignment print the other sequence (the second in the file).

ADD REPLYlink written 7.1 years ago by Cristina40

So, I have to make 2 different alignment if I want to retrieve the alternative Hits because don't appear in the same sam file.

ADD REPLYlink written 7.1 years ago by Cristina40

Now I tried it with a simple test file and didn't seem to have any effect!

ADD REPLYlink written 7.1 years ago by Michael Dondrup45k

Question: The two sequences are identical, Why with z=1 retrieve the first reference sequence and with the z=5 retrieve the second reference sequence?

ADD REPLYlink written 7.1 years ago by Cristina40

Ok, Thanks Michael

ADD REPLYlink written 7.1 years ago by Cristina40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1078 users visited in the last hour