Question: BLAST: One read matches same region multiple times
0
gravatar for godeludanu
3.2 years ago by
godeludanu30
godeludanu30 wrote:

I am aligning nanopore reads to the C. Elegans genome to identify coverage across the genome.

There is a region in the C. Elegans genome which has a very high number of reads matching (an order of magnitude higher than others). I think this is because its a repeat region and has lots of homopolymers. So reads from this region have a lot of errors and their alignment here is ambiguous. As a result a single read blasted to this annoying region ends up with multiple hits because blast can't figure out the best alignment.

Can you suggest any strategies to work around this? My current thought is to prevent BLAST finding multiple hits for a single read in the same region. Is this a good strategy and what is the best way to implement this?

Thanks for your time.

nanopore blast • 1.1k views
ADD COMMENTlink modified 3.2 years ago by WouterDeCoster39k • written 3.2 years ago by godeludanu30

I have no experience with Nanopore but I'm wondering whether blast is the right tool for read mapping in general. Blast is tuned to find regions of similarity between possibly distant species, so it expects to find a sequence aligned at mulriple places and I think it doesn't have the concept of 'mapping quality' (i.e. probability that the mapping is wrong as opposed to alignment score or e-value). I would suggest to try bwa mem which is designed to work with long reads, possibly split across large gaps.

ADD REPLYlink written 3.2 years ago by dariober10k
2
gravatar for abascalfederico
3.2 years ago by
abascalfederico1.1k
Spain
abascalfederico1.1k wrote:

There is no simple solution for repetitive regions. If you you are not interested in them, why don't you mask them from the genome? You can mask according to repeatMasker, to trf (tandem repeat finder) and/or to dust

HTH

ADD COMMENTlink written 3.2 years ago by abascalfederico1.1k

While this strategy can work it sounds like @godeludanu is interested in (finding and) keeping the "best" alignment in this region. I don't know long the reads are in this case but trying a different aligner (e.g. LASTZ) may be a better option.

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by genomax68k
2
gravatar for shwethacm
3.2 years ago by
shwethacm200
Seattle, WA
shwethacm200 wrote:

My first thought is - how long are your reads? If they are several kilobases in length then use bwa-mem (choose the blasr option) or other mapping tools that are tuned to align PacBio and PacBio-like reads. These are optimized for long read length. You will have to filter your output file to find the optimal best alignment.

ADD COMMENTlink written 3.2 years ago by shwethacm200
1
gravatar for WouterDeCoster
3.2 years ago by
Belgium
WouterDeCoster39k wrote:

LAST is an aligner which is used more often for Nanopore sequencing. Perhaps using NanoOK could tell you a lot about your data: https://github.com/TGAC/NanoOK

ADD COMMENTlink written 3.2 years ago by WouterDeCoster39k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1596 users visited in the last hour