Question: BLASR/Bowtie2 short alignments
gravatar for Coryza
3.7 years ago by
Coryza380 wrote:

I'm struggling with the fact that I want to align thousands of ~4KB PacBio sequences to short primer sequences (~20B). (Just seeying which sequences contain the primer sequence). I've tried bowtie2 and blasr so far but both come up with empty results. Trying to change parameters influencing this resulted in nothing so far... so thats why I need your help ;) Anyone knows what parameters to use for this case?


Parameters I used now:

bowtie2 -f -S --local -N 1 -L 10 --no-sq --no-unal -p 12 -x <Index> -U <PacBio>

blasr <PacBio> <Reference> -noSplitSubreads -sam -minPctIdentity 90 -minMatch 10 -nproc 6

blasr bowtie2 • 1.3k views
ADD COMMENTlink modified 3.7 years ago by raghus60610 • written 3.7 years ago by Coryza380

How about aligning the primers against the PacBio sequences. That might produce better results. I recall that there are often problems when you try to soft-clip things this much.

ADD REPLYlink written 3.7 years ago by Devon Ryan88k

I've tried the other way around for blasr, no results. Furthermore: If it would have worked - how then retrieve all hits? Because most likely in that case it will match > 10K reads.

ADD REPLYlink written 3.7 years ago by Coryza380
gravatar for Brian Bushnell
3.7 years ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

I suggest you try BBDuk for this: in=reads.fq outm=matched.fq outu=unmatched.fq ref=primers.fa edist=3 k=20 -da

"edist=3" will allow up to 3 edits between the read and the reference (I'm assuming these are raw uncorrected reads).  K should be set to whatever the shortest sequence is in your set of primers.

If you want to use mapping instead, then map the primers to the reads as Devon suggested; then you can use (also in the BBMap package) to generate coverage stats from a sam file: in=file.sam covstats=covstats.txt nzo

The nzo flag will make it only print sequences that have nonzero coverage.  So, you'll get one line per read with coverage.  The first column will be the read name; so, you can filter the reads using these names with for example in=reads.fq names=names.txt out=filtered.fq include=t

The names file should be just the first column of the covstats output.

ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by Brian Bushnell16k
gravatar for raghus606
3.7 years ago by
United States
raghus60610 wrote:

You could give it a try with BLAT, either way (Primer vs Reads or Reads vs Primer) should be fine, you will only need to filter the data differently.

ADD COMMENTlink written 3.7 years ago by raghus60610

Blat is probably not going to work. From UCSC:

Blat of DNA is designed to quickly find sequences of 95% and greater similarity of length 40 bases or more.
ADD REPLYlink written 3.7 years ago by genomax63k

I have used BLAT for aligning long Pacbio reads to reference genomes, I have also obtained good results when I was aligning 150 b.p illumina reads to particular portions of the genome

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by raghus60610
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1794 users visited in the last hour