Question: Which Aligners Recognize Soft-Masked Repeats In Reference Sequences?
9
gravatar for Jeremy Leipzig
10.0 years ago by
Philadelphia, PA
Jeremy Leipzig19k wrote:

Which aligners (long and short read) behave differently when parts of a reference/target sequence are "soft-masked", i.e. have portions in lowercase to designate repeat regions?

alignment • 8.7k views
ADD COMMENTlink modified 7.6 years ago by Ali R. Vahdati190 • written 10.0 years ago by Jeremy Leipzig19k
14
gravatar for lh3
10.0 years ago by
lh332k
United States
lh332k wrote:

No, do not align to masked genome for any purpose. Filter out the reads mapped to the masked region after whole-genome alignment.

ADD COMMENTlink modified 10.0 years ago • written 10.0 years ago by lh332k
2

Masking has never been perfect and probably will never be perfect. This will lead to wrongly mapped sequences, spurious SNPs/indels calls and all sorts of problems. I cannot think a single use case when masking may lead to better outcomes. Trust me. Do not mask.

ADD REPLYlink written 10.0 years ago by lh332k
1

Yup. Do not mask. You get the most accurate alignment when you align to what is actually there. What you do not want are reads that really belong to repetitive regions being forced to align to the wrong place because you didn't provide the correct sequence for the read to align to.

bwa does not care about lowercase nucleotides.

ADD REPLYlink written 7.6 years ago by swbarnes28.9k

What will be a difference, except for paired-ends or spliced mappings?

ADD REPLYlink written 10.0 years ago by Darked894.2k

so I assume BWA does not care about lowercase nucleotides?

ADD REPLYlink written 10.0 years ago by Jeremy Leipzig19k

BWA always uses all bases in alignment. Again, do not mask, unless you want to play with troubles.

ADD REPLYlink written 10.0 years ago by lh332k
5
gravatar for Haibao Tang
10.0 years ago by
Haibao Tang3.0k
Mountain View, CA
Haibao Tang3.0k wrote:

LASTZ, 'soft-masked' regions are NOT available for seeding but allow extension. It also allows you to specify a separate file for the intervals to mask (with softmask=<mask_file>).

ADD COMMENTlink written 10.0 years ago by Haibao Tang3.0k
1
gravatar for Ali R. Vahdati
7.6 years ago by
Zurich, Switzerland
Ali R. Vahdati190 wrote:

FSA also takes into account soft-masked regions when supplied with --softmasked option.

ADD COMMENTlink modified 13 months ago by RamRS30k • written 7.6 years ago by Ali R. Vahdati190
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 756 users visited in the last hour