Entering edit mode
11.8 years ago
Leszek
4.2k
We expect deletions (10-1000 bases) in the genome of interest, therefore, we run paired MiSeq 300bp+200bp. Can you recommend me the best aligner for such dataset?
In theory, some programs should handle it right:
- bowtie2 supports gapped alignments in end-to-end mode, but in practice I haven't seen big indels
- bwa aln - there are -o and -e parameters, anyone tried it?
- bwa bwasw - align reads with deletion twice: up- and downstream from the deletion, the other part of the read is clipped.
- blat - slow and reports multiple suboptimal hits, plus no SAM output
- any other recommendations?
What is your opinion?
EDIT
Just to make it clear, pairs are overlapping, after merging with FLASH I get single reads ~400 bases.
example:
@M00724:10:000000000-A2EKA:1:1101:16915:1666 1:N:0:1
TTATCTTGAAAAAATGCACCCGCAGCTTCGCTCGTAATCCGAAAACACGGGAAATGGAGTCAGGCTTTTTTTATGGATGAGAAAATAGACACCAAAGAAGCCTTCATATGACCTTAACGGACCTACAATGCAAAAAGTTATCAAAAGACTGCATTATTGAGCACACATAGTACTAATAAAGTAATCTAAGAAGCGTGGTTAGAAAGATAGCGATCTCGTGATGCATTTTTGTAGAACAAAAAAGAATTAGAGATTCTTTGTTGGTAAAATAGCTCTCTCGCGTTGCATTTCTGTTCTGTAAAAATGCAGATCTGATTCTTTGTATGAACAAGTAGCGCTCTCGCGTTGCATTTTTGTTCTACAAAATGAAGCAAAGATGCTTCGTTAACAAAGATATGCTATTGAAGTGCAAGTT
+
5<=,<>>/----5+<-ACC8+>+>>+8C.++7++>+,78.++55>-+,5**+5+------5----5-5CE))4=+4++4+4+4++++44@+11*1***3**;38@*1****11*1*0191:)-./6?(666(6?6(-(/(.((6((./((((../66((((/((./((.(./((/(((((//6<ee;?=(6((.' ('="" (="" (="" .6(6<(.(''(="" 66##(="" ;6;6ee="?;(<;###./(///#(#(#(/(6#6(;..(<=<;/6/6<#6///(#)98/#8*@23*#<*;3#?<4#24+<#4+++C6++4++++6++++D=C56,,,56+-A)DC5*5***C*C+E-FCC7+,EEEEAD9A7...99.AD.A.9.A-9-">+A7+9/AEA8//AC/C@C;>9<-5-5-->
should align to 2-micron plasmid of S. cerevisiae with one deletion of 125 bases (blat .psl output)
380 35 0 0 0 0 1 125 + M00724:10:000000000-A2EKA:1:1101:16915:1666 415 0 415 2-micron 6318 3937 4477 2 59,356, 0,59, 3937,4121,
Try: "bwa bwasw -b9 -q16 -r1 -w200". Let me know what you get. Thanks.
I will be curious to see how BWA works on such long reads.
It is bwa-sw, not bwa. Bwa-sw is known to work with 150kb BACs and even 1Mbp sequences.
BWA-SW does not work with PE? Unless it is being used after joining reads with FLASH?
It has been working with PE for a year or so.
it splits correct alignments obtained with default parameters. the reads with deletions are aligned twice: upstream and downstream from the deletion, the other part of the read is clipped.
That is exactly why I was asking you to use the non-default parameters. You will get longer deletions, though not longer than half of read length. As you are working on small genomes, why not de novo assemble your reads first?