Question: Mismatch issue in bwa
2
gravatar for Bastien Hervé
14 months ago by
Bastien Hervé3.3k
Limoges, CBRS, France
Bastien Hervé3.3k wrote:

Hi everyone,

My aim is to map a fastq file (single-end) to two short reference sequences, to afterward split my reads file in two, according to the reference mapped. My reference sequences are 30bp long and my reads in my fastq file are 50-150bp long.

The problem is, bwa do not allow a mismatch in such short sequences and I would like too

What i tried :

bwa index references.fa
bwa mem references.fa reads.fq

SAM sub-result :

M02945:227:000000000-BHPRH:1:1101:17089:1699 0 ESR 1 60 30M66S * 0 0 GAGAGCTCTATTTCTTCGGTGTTACCAGCTCCCGCTCCTGCGTGTCCTCTGCGCTGCCTGTCCCATCCATTCTTATTCCACGCGTGCCCTATAGTC 11>1>AFFFFFFGGGGGGGGGGGCGGBBEGHHEEEEE?GHHCE/E0AFDGGBE/EE?FHGHFFBFFBGGHFHGHFHH2DE1///>/>CEF0@12B@ NM:i:0 MD:Z:30 AS:i:30 XS:i:0 M02945:227:000000000-BHPRH:1:1101:18100:1700 4 * 0 0 * * 0 0 GAGAGCTCTATTACTTCACTGGAGGGAAGCTCCCACGCGTGCCCTATAGTC 11>>AAFFFDFFFFGGGFFGGB1CEG?A0FGHFCBF00AE?EF0F0B12FD AS:i:0 XS:i:0

bwa-mem works great but on the "M02945:227:000000000-BHPRH:1:1101:18100:1700" read, there is a mismatch and bwa flagged this read as unmapped (FLAG = 4)

I also try to decrease the mismatch score to 0 bwa-mem -B 0 references.fa reads.fq and to manually generate a mismatch in a mapped read to see if the read length has an impact on the result.

Both did not work.

In my last attempt I used bwa-aln + bwa-samse instead of bwa-mem

bwa aln references.fa reads.fq > bwa_aln.sai
bwa samse references.fa bwa_aln.sai reads.fq

SAM sub-result :

M02945:227:000000000-BHPRH:1:1101:17089:1699 4 * 0 0 * * 0 0 GAGAGCTCTATTTCTTCGGTGTTACCAGCTCCCGCTCCTGCGTGTCCTCTGCGCTGCCTGTCCCATCCATTCTTATTCCACGCGTGCCCTATAGTC 11>1>AFFFFFFGGGGGGGGGGGCGGBBEGHHEEEEE?GHHCE/E0AFDGGBE/EE?FHGHFFBFFBGGHFHGHFHH2DE1///>/>CEF0@12B@ M02945:227:000000000-BHPRH:1:1101:18100:1700 4 * 0 0 * * 0 0 GAGAGCTCTATTACTTCACTGGAGGGAAGCTCCCACGCGTGCCCTATAGTC 11>>AAFFFDFFFFGGGFFGGB1CEG?A0FGHFCBF00AE?EF0F0B12FD

No one mapped...

I am a bit lost, is there a proper way to force bwa to allow a mismatch in short reads mapping or an other tool specific to this ?

Bwa version : Version: 0.7.12-r1039

Reference sequences :

>ESR

GAGAGCTCTATTTCTTCGGTGTTACCAGCT

>Cercle

GAGAGCTCTATTACTGCACTGGAGGGAAGC

Thanks a lot !

ADD COMMENTlink modified 14 months ago by genomax62k • written 14 months ago by Bastien Hervé3.3k
1

My reference sequences are 30pb long and my reads in my fastq file are 50-150pb long.

I've never done anything like this, but I have my doubts whether you are using the right tool for the job.

By the way, do you mean bp instead of pb? Might be language confusion, but bp = basepairs et pb est pairs de base? Je ne sais pas.

ADD REPLYlink modified 14 months ago • written 14 months ago by WouterDeCoster36k

Yes, sorry, french confusion here i meant bp in this post. May blast be better on this task ?

ADD REPLYlink written 14 months ago by Bastien Hervé3.3k
1

I have my doubts whether you are using the right tool for the job

Following Wouter's comment, in a recent post (A: Best tool for finding short alignment between to sequences) I suggested vmatch. Maybe you can have a look to see if it's more appropriate then bwa.

ADD REPLYlink written 14 months ago by dariober9.9k
1
gravatar for Macspider
14 months ago by
Macspider2.7k
Vienna - BOKU
Macspider2.7k wrote:

In any aligner worth using, you can adjust the seed length. You can do it in bwa as well:

   -k INT        minimum seed length [19]
ADD COMMENTlink written 14 months ago by Macspider2.7k

Thanks for advice, I tried different length of seed and sadly no impact on results.

ADD REPLYlink written 14 months ago by Bastien Hervé3.3k

Did you also lower the minimum accepted score? I think there's an option for that.

ADD REPLYlink written 13 months ago by Macspider2.7k
1
gravatar for genomax
14 months ago by
genomax62k
United States
genomax62k wrote:

You could give bbmap.sh from BBMap suite a try. There are many options you can play with.

Input sequence

@M02945:227:000000000-BHPRH:1:1101:17089:1699
GAGAGCTCTATTTCTTCGGTGTTACCAGCTCCCGCTCCTGCGTGTCCTCTGCGCTGCCTGTCCCATCCATTCTTATTCCACGCGTGCCCTATAGTC
+
111>1>AFFFFFFGGGGGGGGGGGCGGBBEGHHEEEEE?GHHCE/E0AFDGGBE/EE?FHGHFFBFFBGGHFHGHFHH2DE1///>/>CEF0@12B
@M02945:227:000000000-BHPRH:1:1101:18100:1700
GAGAGCTCTATTACTTCACTGGAGGGAAGCTCCCACGCGTGCCCTATAGTC
+
111>>AAFFFDFFFFGGGFFGGB1CEG?A0FGHFCBF00AE?EF0F0B12F

Reference

>ESR
GAGAGCTCTATTTCTTCGGTGTTACCAGCT
>Cercle
GAGAGCTCTATTACTGCACTGGAGGGAAGC

Command used

bbmap.sh in=r.fq out=stdout.sam ref=ref.fa k=5 maxindel=10 minid=0.7

Output

M02945:227:000000000-BHPRH:1:1101:17089:1699    4       *       0       0       *       *       0       0       GAGAGCTCTATTTCTTCGGTGTTACCAGCTCCCGCTCCTGCGTGTCCTCTGCGCTGCCTGTCCCATCCATTCTTATTCCACGCGTGCCCTATAGTC    111>1>AFFFFFFGGGGGGGGGGGCGGBBEGHHEEEEE?GHHCE/E0AFDGGBE/EE?FHGHFFBFFBGGHFHGHFHH2DE1///>/>CEF0@12B
M02945:227:000000000-BHPRH:1:1101:18100:1700    0       Cercle  1       5       15=1X14=21S     *       0       0       GAGAGCTCTATTACTTCACTGGAGGGAAGCTCCCACGCGTGCCCTATAGTC 111>>AAFFFDFFFFGGGFFGGB1CEG?A0FGHFCBF00AE?EF0F0B12F     NM:i:1  AM:i:5
ADD COMMENTlink modified 14 months ago • written 14 months ago by genomax62k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 652 users visited in the last hour