Mismatch issue in bwa
2
2
Entering edit mode
6.4 years ago

Hi everyone,

My aim is to map a fastq file (single-end) to two short reference sequences, to afterward split my reads file in two, according to the reference mapped. My reference sequences are 30bp long and my reads in my fastq file are 50-150bp long.

The problem is, bwa do not allow a mismatch in such short sequences and I would like too

What i tried :

bwa index references.fa
bwa mem references.fa reads.fq

SAM sub-result :

M02945:227:000000000-BHPRH:1:1101:17089:1699 0 ESR 1 60 30M66S * 0 0 GAGAGCTCTATTTCTTCGGTGTTACCAGCTCCCGCTCCTGCGTGTCCTCTGCGCTGCCTGTCCCATCCATTCTTATTCCACGCGTGCCCTATAGTC 11>1>AFFFFFFGGGGGGGGGGGCGGBBEGHHEEEEE?GHHCE/E0AFDGGBE/EE?FHGHFFBFFBGGHFHGHFHH2DE1///>/>CEF0@12B@ NM:i:0 MD:Z:30 AS:i:30 XS:i:0 M02945:227:000000000-BHPRH:1:1101:18100:1700 4 * 0 0 * * 0 0 GAGAGCTCTATTACTTCACTGGAGGGAAGCTCCCACGCGTGCCCTATAGTC 11>>AAFFFDFFFFGGGFFGGB1CEG?A0FGHFCBF00AE?EF0F0B12FD AS:i:0 XS:i:0

bwa-mem works great but on the "M02945:227:000000000-BHPRH:1:1101:18100:1700" read, there is a mismatch and bwa flagged this read as unmapped (FLAG = 4)

I also try to decrease the mismatch score to 0 bwa-mem -B 0 references.fa reads.fq and to manually generate a mismatch in a mapped read to see if the read length has an impact on the result.

Both did not work.

In my last attempt I used bwa-aln + bwa-samse instead of bwa-mem

bwa aln references.fa reads.fq > bwa_aln.sai
bwa samse references.fa bwa_aln.sai reads.fq

SAM sub-result :

M02945:227:000000000-BHPRH:1:1101:17089:1699 4 * 0 0 * * 0 0 GAGAGCTCTATTTCTTCGGTGTTACCAGCTCCCGCTCCTGCGTGTCCTCTGCGCTGCCTGTCCCATCCATTCTTATTCCACGCGTGCCCTATAGTC 11>1>AFFFFFFGGGGGGGGGGGCGGBBEGHHEEEEE?GHHCE/E0AFDGGBE/EE?FHGHFFBFFBGGHFHGHFHH2DE1///>/>CEF0@12B@ M02945:227:000000000-BHPRH:1:1101:18100:1700 4 * 0 0 * * 0 0 GAGAGCTCTATTACTTCACTGGAGGGAAGCTCCCACGCGTGCCCTATAGTC 11>>AAFFFDFFFFGGGFFGGB1CEG?A0FGHFCBF00AE?EF0F0B12FD

No one mapped...

I am a bit lost, is there a proper way to force bwa to allow a mismatch in short reads mapping or an other tool specific to this ?

Bwa version : Version: 0.7.12-r1039

Reference sequences :

>ESR

GAGAGCTCTATTTCTTCGGTGTTACCAGCT

>Cercle

GAGAGCTCTATTACTGCACTGGAGGGAAGC

Thanks a lot !

bwa bwa-mem bwa-aln bwa-samse short reads • 3.8k views
ADD COMMENT
1
Entering edit mode

My reference sequences are 30pb long and my reads in my fastq file are 50-150pb long.

I've never done anything like this, but I have my doubts whether you are using the right tool for the job.

By the way, do you mean bp instead of pb? Might be language confusion, but bp = basepairs et pb est pairs de base? Je ne sais pas.

ADD REPLY
0
Entering edit mode

Yes, sorry, french confusion here i meant bp in this post. May blast be better on this task ?

ADD REPLY
1
Entering edit mode

I have my doubts whether you are using the right tool for the job

Following Wouter's comment, in a recent post (A: Best tool for finding short alignment between to sequences) I suggested vmatch. Maybe you can have a look to see if it's more appropriate then bwa.

ADD REPLY
1
Entering edit mode
6.4 years ago

In any aligner worth using, you can adjust the seed length. You can do it in bwa as well:

   -k INT        minimum seed length [19]
ADD COMMENT
0
Entering edit mode

Thanks for advice, I tried different length of seed and sadly no impact on results.

ADD REPLY
0
Entering edit mode

Did you also lower the minimum accepted score? I think there's an option for that.

ADD REPLY
1
Entering edit mode
6.4 years ago
GenoMax 141k

You could give bbmap.sh from BBMap suite a try. There are many options you can play with.

Input sequence

@M02945:227:000000000-BHPRH:1:1101:17089:1699
GAGAGCTCTATTTCTTCGGTGTTACCAGCTCCCGCTCCTGCGTGTCCTCTGCGCTGCCTGTCCCATCCATTCTTATTCCACGCGTGCCCTATAGTC
+
111>1>AFFFFFFGGGGGGGGGGGCGGBBEGHHEEEEE?GHHCE/E0AFDGGBE/EE?FHGHFFBFFBGGHFHGHFHH2DE1///>/>CEF0@12B
@M02945:227:000000000-BHPRH:1:1101:18100:1700
GAGAGCTCTATTACTTCACTGGAGGGAAGCTCCCACGCGTGCCCTATAGTC
+
111>>AAFFFDFFFFGGGFFGGB1CEG?A0FGHFCBF00AE?EF0F0B12F

Reference

>ESR
GAGAGCTCTATTTCTTCGGTGTTACCAGCT
>Cercle
GAGAGCTCTATTACTGCACTGGAGGGAAGC

Command used

bbmap.sh in=r.fq out=stdout.sam ref=ref.fa k=5 maxindel=10 minid=0.7

Output

M02945:227:000000000-BHPRH:1:1101:17089:1699    4       *       0       0       *       *       0       0       GAGAGCTCTATTTCTTCGGTGTTACCAGCTCCCGCTCCTGCGTGTCCTCTGCGCTGCCTGTCCCATCCATTCTTATTCCACGCGTGCCCTATAGTC    111>1>AFFFFFFGGGGGGGGGGGCGGBBEGHHEEEEE?GHHCE/E0AFDGGBE/EE?FHGHFFBFFBGGHFHGHFHH2DE1///>/>CEF0@12B
M02945:227:000000000-BHPRH:1:1101:18100:1700    0       Cercle  1       5       15=1X14=21S     *       0       0       GAGAGCTCTATTACTTCACTGGAGGGAAGCTCCCACGCGTGCCCTATAGTC 111>>AAFFFDFFFFGGGFFGGB1CEG?A0FGHFCBF00AE?EF0F0B12F     NM:i:1  AM:i:5
ADD COMMENT

Login before adding your answer.

Traffic: 1747 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6