BWA and SOAP2 cannot map a particular sequence which exists in the genome
2
1
Entering edit mode
10.0 years ago

Hi

I have tried to map this short sequence

>sample_1_x12179
TCCTGTACTGAGCTGCCCCGA

which exists on chr8 twice, using various aligners.

BLAT, BLAST, and Bowtie2 found it. BWA MEM and SOAP2 did not. How come? BWA MEM finds it if I include some extra flanking bases. Is it because the sequence is too short?

BWA MEM command:

bwa mem -t 8 $bwaIndex tmp.fasta

SOAP command:

./soap2.20 \
 -a tmp.fasta \
 -o ${goutputfile}.tmp \
 -D /mnt/NGS01/ReferenceDB/mirTools/db/genome/hsa.fa.index \
 -M 0 -r 2 -v 5 -p 8 -l 15

I also tried -M 4 (find best hits), but still no results. If I remove the last two bases from the above sequence, then SOAP2 finds a hit on chr11 that requires 2 mismatches.

Thank you.

alignment bwa soap2 • 3.5k views
ADD COMMENT
2
Entering edit mode

You're not using the bwa mem command in your example. And it wouldn't work anyway because bwa mem expects a 70bp minimum query sequence length for aligning. Does running bwa aln without the mem like you've shown work okay?

ADD REPLY
0
Entering edit mode

Sorry, I did use 'bwa mem', just accidentally posted the wrong command. I'll try 'bwa aln' too. Thank you for your answer.

ADD REPLY
0
Entering edit mode

Can you post your SOAP command as well?

ADD REPLY
0
Entering edit mode

I've edited my post and added the SOAP2 command.

ADD REPLY
3
Entering edit mode
10.0 years ago
Dan D 7.4k

BWA works for me (but not bwa mem, because it requires 70bp minimum query sequence length):

bwa aln $GRCh37 test.fa > test.sai
bwa samse $GRCh37 test.sai test.fa > test.sam

sample_1_x12179      16      8       41518003        0       21M     *       0 0TCGGGGCAGCTCAGTACAGGA   *       XT:A:R  NM:i:0  X0:i:2  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:21 XA:Z:8,+41517962,21M,0;
_____________________________^_______________________________^^^

The documentation for SOAP2 sucks, but I'm guessing it's an issue with your query read length.

ADD COMMENT
2
Entering edit mode
10.0 years ago
lh3 33k
bwa mem -aT20 $GRC37 test.fa

should give you the result and give you all the perfect matches. However, once there are errors in your short sequences, bwa-aln will have a bigger chance to find the hit.

ADD COMMENT
0
Entering edit mode

Thank you. It seems to work. I'll read more up on the various parameter settings.

ADD REPLY

Login before adding your answer.

Traffic: 2365 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6