Question: Alternative to BLASR ?
gravatar for Roxane Boyer
11 weeks ago by
Roxane Boyer180
Roxane Boyer180 wrote:

Hi everyone,

In order to polish my Falcon assembly of my D. suzukii genome, from PacBio long reads, I need to align the raw reads to my genome and create a .bam or a .sam file.

Such a tool already exist at PacBio, called blasr, but is currently not working (the 5.3 version I've installed do not provide a .bam or a .sam output... Which is VERY annoying).

Does an alternative of such aligner exist ? It has to be a aligner aware of the higher error rate of PacBio reads (that's why I can't use a classic aligner).

Any ideas ? I'm truely desesperate, I absolutely need to polish my assembly before going further in my analysis.

Thanks for your help !


pacbio alignement assembly • 258 views
ADD COMMENTlink modified 11 weeks ago by brent_wilson60 • written 11 weeks ago by Roxane Boyer180
gravatar for Medhat
11 weeks ago by
Medhat6.3k wrote:

bwa and BBmap (as far as I know limited to 6kb)

PacBio subreads or Oxford Nanopore reads to a reference genome:
bwa mem -x pacbio ref.fa reads.fq > aln.sam

more here

ADD COMMENTlink modified 11 weeks ago • written 11 weeks ago by Medhat6.3k

Roxane Boyer : With BBMap PacBio reads needed to be in fasta format and those above 6kb were broken into 6kb pieces during search automagically (with an _1, _2 etc). Since Brian Bushnell does not remove functionality from BBMap suite it should still work. Also see this original post. Try and let us know.

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by genomax224k

Okay thanks you a lot for the advices ! I'm going to try this !

ADD REPLYlink written 11 weeks ago by Roxane Boyer180

Hi Genomax,

I'm coming back to you again about BBmap. After having some issues with bwa-mem (depending on the number of threads used, I didn't get the same results), I want to try bbmap. I understood that the large reads from my pacbio sequencing will have to be sliced in smaller pieces. However, as it's my first time using this tool, I was looking for some advanced parameters for PacBio reads, here what I found ( ) :

$ -Xmx20g k=7 in=reads.fastq ref=reference.fa maxlen=1000 minlen=200 idtag ow int=f qin=33 out=mapped1.sam minratio=0.15 ignorequality slow ordered maxindel1=40 maxindel2=400

In this post, it is written that :

The "maxlen" flag shreds them to a max length of 1000; you can set that up to 6000. But I found 1000 gave a higher mapping rate.

But this section was "Oxford Nanopore" section, I assume that the author meant long reads generally, but as I'm not really sure, I wanted to have your opinion.

Should I use 6kb or 1kb for the reads slicing for PacBio data ?

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by Roxane Boyer180

What do you mean limited to 6kb ? The read length ? Because some of my reads are much longer than 6kb. Anyway, thanks for the tools I'm going to have a look on it

ADD REPLYlink written 11 weeks ago by Roxane Boyer180

Okay, I've tried the following lines :

./bwa index /media/loutre/SUZUKII/assembly_tries/pb_268_p_ctg.fasta
./bwa mem -x pacbio /media/loutre/SUZUKII/assembly_tries/pb_268_p_ctg.fasta /media/DATAPART1/Documents/suzukii_assembly/filtered_subreads.fastq | gzip -3 > /media/loutre/SUZUKII/assembly_tries/falcon_aligned_reads.sam.gz

Then I wanted to visualize my alignements on IGV with the decompressed sam file, but no reads where shown, IGV said that no reads was corresponding to my genome (but it was actually the right file).

Here the head of my sam file :

@SQ SN:000000F  LN:20279970
@SQ SN:000001F  LN:15392955
@SQ SN:000002F  LN:12197397
@SQ SN:000003F  LN:8269814
@SQ SN:000004F  LN:7009319
@SQ SN:000005F  LN:6740481
@SQ SN:000006F  LN:5550981
@SQ SN:000007F  LN:4663438
@SQ SN:000008F  LN:4675199
@SQ SN:000009F  LN:4040179

I'm not very familiar with the SAM format, but it looks like something is wrong. Did I missed something ? Do I have to reformat the output so it look like more at something like this :

@HD VN:1.5 SO:coordinate
@SQ SN:ref LN:45
r001   99 ref  7 30 8M2I4M1D3M = 37  39 TTAGATAAAGGATACTG *
r002    0 ref  9 30 3S6M1P1I4M *  0   0 AAAAGATAAGGATA    *
r003    0 ref  9 30 5S6M       *  0   0 GCCTAAGCTAA       * SA:Z:ref,29,-,6H5M,17,0;
r004    0 ref 16 30 6M14N5M    *  0   0 ATAGCTTCAGC       *
r003 2064 ref 29 17 6H5M       *  0   0 TAGGC             * SA:Z:ref,9,+,5S6M,30,1;
r001  147 ref 37 30 9M         =  7 -39 CAGCGGCAT         * NM:i:1
ADD REPLYlink written 10 weeks ago by Roxane Boyer180

Why do you pipe bwa mem to gzip? I haven't seen something like that before and would suggest this:

bwa mem ref.fasta input.fastq | samtools view -b - | samtools sort - > output.bam

After alignment the sam file is converted to bam and sorted without intermediate files.

ADD REPLYlink written 10 weeks ago by WouterDeCoster15k

Well, I've just copied what is written on the README file of bwa mem ! I'm not a original person, I've just followed the example they gived :

./bwa mem ref.fa read-se.fq.gz | gzip -3 > aln-se.sam.gz

But well, I'm going to try without compressing it.

To go back on my first file output, we agree that this sam file is empty right ? There is no informations inside it ?

ADD REPLYlink written 10 weeks ago by Roxane Boyer180

To get information about the reads in a sam/bam file use:

samtools idxstats yourfile.bam
samtools flagstat yourfile.bam
ADD REPLYlink written 10 weeks ago by WouterDeCoster15k

Well, it says that it failed to load the index, I guess it means it's not a real sam file ? The construction of my file look weird for me. I'm not sure if this is because I didn't saw a lot of Samf ile, or if it's because something really went wrong.

ADD REPLYlink written 10 weeks ago by Roxane Boyer180

Did you create an index of the file using the following command?

samtools index yourfile.bam

I've no experience with creating sam.gz files, so it's possible that this is the source of the problem

ADD REPLYlink written 10 weeks ago by WouterDeCoster15k
gravatar for brent_wilson
11 weeks ago by
Cofactor Genomics (St. Louis, MO)
brent_wilson60 wrote:

I have had much better luck for PACBIO alignment with BLAT, followed by some filtering for "true" alignments (alignment length, number of matches, etc.). It took a bit of work to set the parameters, and I haven't dealt with it in awhile, but it worked well for finding alignments (even for partial hits when I was working with draft assemblies).

Brent Wilson, PhD | Project Scientist | Cofactor Genomics

4044 Clayton Ave. | St. Louis, MO 63110 | tel. 314.531.4647

Catch the latest from Cofactor on our blog.

ADD COMMENTlink written 11 weeks ago by brent_wilson60

It would be useful to list those parameters, if you can dig them up. Will save people having to go through the same effort again (or at least serve as a starting point).

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by genomax224k

I would have, but everything is saved over at my previous position where I can't access it anymore. I do remember looking at the identity and coverage of each hit and using dot plots to train the parameters.

ADD REPLYlink written 11 weeks ago by brent_wilson60

Hi Brent, thanks for your indication ! So you don't have any memory of how we can parameter BLAT for such an use ? it will be really useful for me to have such an information ! If you remember of something, to hesitate to write it down here !

ADD REPLYlink written 10 weeks ago by Roxane Boyer180
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1489 users visited in the last hour