Question: Alternative to BLASR ?
gravatar for Roxane Boyer
2.0 years ago by
Roxane Boyer870
France / Toulouse / GeT-Plage
Roxane Boyer870 wrote:

Hi everyone,

In order to polish my Falcon assembly of my D. suzukii genome, from PacBio long reads, I need to align the raw reads to my genome and create a .bam or a .sam file.

Such a tool already exist at PacBio, called blasr, but is currently not working (the 5.3 version I've installed do not provide a .bam or a .sam output... Which is VERY annoying).

Does an alternative of such aligner exist ? It has to be a aligner aware of the higher error rate of PacBio reads (that's why I can't use a classic aligner).

Any ideas ? I'm truely desesperate, I absolutely need to polish my assembly before going further in my analysis.

Thanks for your help !


pacbio alignement assembly • 1.7k views
ADD COMMENTlink modified 5 months ago by harish140 • written 2.0 years ago by Roxane Boyer870
gravatar for Medhat
2.0 years ago by
Medhat8.1k wrote:

bwa and BBmap (as far as I know limited to 6kb)

PacBio subreads or Oxford Nanopore reads to a reference genome:
bwa mem -x pacbio ref.fa reads.fq > aln.sam

more here



is a versatile sequence alignment program that aligns DNA or mRNA sequences against a large reference database. minimap2 is tens of times faster than mainstream long-read mappers such as BLASR, BWA-MEM, NGMLR and GMAP.

ADD COMMENTlink modified 27 days ago • written 2.0 years ago by Medhat8.1k

Roxane Boyer : With BBMap PacBio reads needed to be in fasta format and those above 6kb were broken into 6kb pieces during search automagically (with an _1, _2 etc). Since Brian Bushnell does not remove functionality from BBMap suite it should still work. Also see this original post. Try and let us know.

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by genomax60k

Okay thanks you a lot for the advices ! I'm going to try this !

ADD REPLYlink written 2.0 years ago by Roxane Boyer870

Hi Genomax,

I'm coming back to you again about BBmap. After having some issues with bwa-mem (depending on the number of threads used, I didn't get the same results), I want to try bbmap. I understood that the large reads from my pacbio sequencing will have to be sliced in smaller pieces. However, as it's my first time using this tool, I was looking for some advanced parameters for PacBio reads, here what I found ( ) :

$ -Xmx20g k=7 in=reads.fastq ref=reference.fa maxlen=1000 minlen=200 idtag ow int=f qin=33 out=mapped1.sam minratio=0.15 ignorequality slow ordered maxindel1=40 maxindel2=400

In this post, it is written that :

The "maxlen" flag shreds them to a max length of 1000; you can set that up to 6000. But I found 1000 gave a higher mapping rate.

But this section was "Oxford Nanopore" section, I assume that the author meant long reads generally, but as I'm not really sure, I wanted to have your opinion.

Should I use 6kb or 1kb for the reads slicing for PacBio data ?

ADD REPLYlink modified 24 months ago • written 24 months ago by Roxane Boyer870

What do you mean limited to 6kb ? The read length ? Because some of my reads are much longer than 6kb. Anyway, thanks for the tools I'm going to have a look on it

ADD REPLYlink written 2.0 years ago by Roxane Boyer870

Okay, I've tried the following lines :

./bwa index /media/loutre/SUZUKII/assembly_tries/pb_268_p_ctg.fasta
./bwa mem -x pacbio /media/loutre/SUZUKII/assembly_tries/pb_268_p_ctg.fasta /media/DATAPART1/Documents/suzukii_assembly/filtered_subreads.fastq | gzip -3 > /media/loutre/SUZUKII/assembly_tries/falcon_aligned_reads.sam.gz

Then I wanted to visualize my alignements on IGV with the decompressed sam file, but no reads where shown, IGV said that no reads was corresponding to my genome (but it was actually the right file).

Here the head of my sam file :

@SQ SN:000000F  LN:20279970
@SQ SN:000001F  LN:15392955
@SQ SN:000002F  LN:12197397
@SQ SN:000003F  LN:8269814
@SQ SN:000004F  LN:7009319
@SQ SN:000005F  LN:6740481
@SQ SN:000006F  LN:5550981
@SQ SN:000007F  LN:4663438
@SQ SN:000008F  LN:4675199
@SQ SN:000009F  LN:4040179

I'm not very familiar with the SAM format, but it looks like something is wrong. Did I missed something ? Do I have to reformat the output so it look like more at something like this :

@HD VN:1.5 SO:coordinate
@SQ SN:ref LN:45
r001   99 ref  7 30 8M2I4M1D3M = 37  39 TTAGATAAAGGATACTG *
r002    0 ref  9 30 3S6M1P1I4M *  0   0 AAAAGATAAGGATA    *
r003    0 ref  9 30 5S6M       *  0   0 GCCTAAGCTAA       * SA:Z:ref,29,-,6H5M,17,0;
r004    0 ref 16 30 6M14N5M    *  0   0 ATAGCTTCAGC       *
r003 2064 ref 29 17 6H5M       *  0   0 TAGGC             * SA:Z:ref,9,+,5S6M,30,1;
r001  147 ref 37 30 9M         =  7 -39 CAGCGGCAT         * NM:i:1
ADD REPLYlink written 2.0 years ago by Roxane Boyer870

Why do you pipe bwa mem to gzip? I haven't seen something like that before and would suggest this:

bwa mem ref.fasta input.fastq | samtools view -b - | samtools sort - > output.bam

After alignment the sam file is converted to bam and sorted without intermediate files.

ADD REPLYlink written 2.0 years ago by WouterDeCoster35k

Well, I've just copied what is written on the README file of bwa mem ! I'm not a original person, I've just followed the example they gived :

./bwa mem ref.fa read-se.fq.gz | gzip -3 > aln-se.sam.gz

But well, I'm going to try without compressing it.

To go back on my first file output, we agree that this sam file is empty right ? There is no informations inside it ?

ADD REPLYlink written 2.0 years ago by Roxane Boyer870

To get information about the reads in a sam/bam file use:

samtools idxstats yourfile.bam
samtools flagstat yourfile.bam
ADD REPLYlink written 2.0 years ago by WouterDeCoster35k

Well, it says that it failed to load the index, I guess it means it's not a real sam file ? The construction of my file look weird for me. I'm not sure if this is because I didn't saw a lot of Samf ile, or if it's because something really went wrong.

ADD REPLYlink written 2.0 years ago by Roxane Boyer870

Did you create an index of the file using the following command?

samtools index yourfile.bam

I've no experience with creating sam.gz files, so it's possible that this is the source of the problem

ADD REPLYlink written 2.0 years ago by WouterDeCoster35k
gravatar for brent_wilson
2.0 years ago by
Cofactor Genomics (St. Louis, MO)
brent_wilson90 wrote:

I have had much better luck for PACBIO alignment with BLAT, followed by some filtering for "true" alignments (alignment length, number of matches, etc.). It took a bit of work to set the parameters, and I haven't dealt with it in awhile, but it worked well for finding alignments (even for partial hits when I was working with draft assemblies).

Brent Wilson, PhD | Project Scientist | Cofactor Genomics

4044 Clayton Ave. | St. Louis, MO 63110 | tel. 314.531.4647

Catch the latest from Cofactor on our blog.

ADD COMMENTlink written 2.0 years ago by brent_wilson90

It would be useful to list those parameters, if you can dig them up. Will save people having to go through the same effort again (or at least serve as a starting point).

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by genomax60k

I would have, but everything is saved over at my previous position where I can't access it anymore. I do remember looking at the identity and coverage of each hit and using dot plots to train the parameters.

ADD REPLYlink written 2.0 years ago by brent_wilson90

Hi Brent, thanks for your indication ! So you don't have any memory of how we can parameter BLAT for such an use ? it will be really useful for me to have such an information ! If you remember of something, to hesitate to write it down here !

ADD REPLYlink written 2.0 years ago by Roxane Boyer870
gravatar for harish
5 months ago by
harish140 wrote:


Sorry to bring up the thread, but I wanted to put it down in case someone else faced the issue. I have recently used minimap2 to map the PacBio reads and then use pbbamify to put the alignment in the requisite format for Arrow.

I had about 100x data for a genome I am assembling and blasr was taking ages.

Please do give this a try!

ADD COMMENTlink written 5 months ago by harish140

Hello and thanks for all these suggestions. These further comments might help someone else.

I have used minimap2 to map PacBio raw reads (converted to fastq as required for minimap2) against my de novo assembly. I recommend this approach over blasr. Use samtools to get the aligned.bam. Minimap2 does not output bam files.

I then used smrt tools to create a mydata.xml file using the following:

create mydata.xml /bam/*subreads.bam

Then use pbbamify to convert the aligned.bam to correct pb format for submission to genomicconsensus arrow. Took a while to work through this process - thought it might help someone else.

pbbamify --input aligned.bam --output=aligned.pb.bam ref.fasta mydata.xml

ADD REPLYlink written 5 weeks ago by peri.tobias10

Use samtools to get the aligned.bam. Minimap2 does not output bam files.

Just FYI (unclear how you did it) that you can do

minimap2 -t 8 -a genome.fa reads.fastq.gz | samtools sort -@8 -o my_sorted_alignment.bam

and as such avoid intermediate files and directly create a bam file (indeed using samtools).

ADD REPLYlink written 5 weeks ago by WouterDeCoster35k

Yes, you are correct. I should have been clearer. This is the script I used.

minimap2 -ax map-pb genome.fasta *.fastq \ | samtools view -hF 256 - \ | samtools sort -@ 8 -o genome_aligned.bam

I should also say that I am still having some trouble with pbbamify. So I haven't got this whole step working seamlessly yet.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by peri.tobias10

Minimap doesn't really care for fastq files tbh. As such the Sequel runs have series of '!!!!!!' representing the qualities anyways.

You can definitely use Minimap's "-a" parameter to get a sam file and then convert it into Arrow specific format using pbbamify.

Also take a look at pbbioconda repos. If you have the Bams from Sequel, you can directly use Pbmm2, which uses minimap api to align the reads to the genome.

Plus these days you don't really need SMRTlink tools unless you are really looking at specific analysis. Most genome analysis these days are covered in pbbioconda!

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by harish140

Thanks for these tips @harish. I did not know that I could directly use subreads with Pbmm2. Actually this all looks more promising as a way forward.

Pbbamify worked after 130 hours of walltime. It output a very large pb.bam (351G) from aligned.bam (36G). Arrow still needs a pbi indexed file according to error report.

I really appreciate your help on this!

ADD REPLYlink written 28 days ago by peri.tobias10

You can use pbindex to generate the index files.

ADD REPLYlink written 28 days ago by harish140
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1786 users visited in the last hour