Understanding BWA parameters
0
1
Entering edit mode
9 months ago
nicole.leahy ▴ 20

I apologize in advanced for the simplicity of this question. I haven't been able to find the answer yet, but it may be due to the large number of hits the search returns.

The short version of the question is I am trying to understand what the algorithm options (-k, -w, -d, etc.) mean for BWA beyond the MAN page description.

I am on a project for testing siRNA matches in a wide range of taxa. The assistance I am being given when I ask about the parameters is to "play around." At the same time, I am pressured for immediate results. I am wondering if there is a source that better explains the parameters.

For example: -k INT minimum seed length [19]

What does the seed length mean in this context? Same for the other parameters? https://bio-bwa.sourceforge.net/bwa.shtml

Again, I apologize for this question, but I'm pressured for immediate results while being told to "play around" and I'm at my wit's end.

BWA • 1.1k views
ADD COMMENT
1
Entering edit mode

What is the length of your input reads and what is the size of the siRNA you are expecting to see/match against. Are you looking to get ungapped matches.

ADD REPLY
1
Entering edit mode

The siRNA is approximately 300bp long. I am to generate 21-mers from the sequence (1-21, 2-22, etc.), considering both the 21-mer and reverse complement. For now, I'm just matching to human mRNA.

Here are the requirements:

  • The first 8 bases of the 21-mer (or it's reverse compliment) must be an exact match
    • Mismatches are allowed, but indels are not
    • Report all mRNAs with 14 or more matches
ADD REPLY
0
Entering edit mode

I would highly suggest you ask this question on the BWA github page. Hopefully someone more knowledge (such as HengLi). This is actually an advanced and not simple question!

Other people have asked similar questions on specific params, so searching for "biostars k bwa" might give good results eg: how to understand the -k parameter in bwa mem method?

My last suggestion is to take ~10 reads, simulated or high quality, that you know where they map, then use this as a dataset for tweaking your results. Start with one param, modify with sensible ranges, then see the effects. Hopefully this will give provide some idea what each param does.

ADD REPLY

Login before adding your answer.

Traffic: 3821 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6