polishing canu using quiver
2
0
Entering edit mode
2.7 years ago
renhaol ▴ 50

Hi everyone,

I am new to genome assembly, and I am trying to use quiver to polish my result from Canu. I am using yeast W303 data, RS2 P4-C2 (https://github.com/PacificBiosciences/DevNet/wiki/Saccharomyces-cerevisiae-W303-Assembly-Contigs ). Apparently that pbalign does not support cmp.h5 anymore. I tried to use pbalign to generate .bam and use samtools to sort the data, then used the final sorted .bam file as input for quiver and the result fasta file from canu as reference. However, the error message I got from quiver is:

"This does not appear to be a valid PacBio BAM file. Only datasets from RS II and Sequel instruments are supported by this program."


I even tried to output .sam file and use samtools to convert .sam to .bam. I still got the same error. I also tried to directly use the .bam output from pbalign as input for quiver, but I still got the same error message.

I saw in a different post which says that I should convert the bax.h5 files to .subread.bam using bax2bam, and use that as the input for pbalign. However, when I tried that, I got an error message from bax2bam:

ERROR: unsupported sequencing chemistry combination:
binding kit:        100236500
sequencing kit:     001558034
basecaller version: 2.0.1.49.123864

ERROR: BindingKit, SequencingKit, and ChangeListID are mandatory but unavailable


Please give me some suggestions, I truly appreciated it!

quiver canu polishing Assembly • 2.0k views
2
Entering edit mode
22 months ago

Hi, I hope that you have solved your issue, but if someone will have similar problem here is the solution. There are several steps which need to be done to before you use pacbio data with Quiver/Arrow for genome improvement.

Disclaimer: in my case I have installed all the programs through conda.

Step 1. Unzip *.bax.h5 files from your downloaded *.zip files. You are problalby aware that they are quite heavy.

Step 2. Convert .bax.h5 files provided with pacbio data with bax2bam* tool (conda / github). Remember that input files should be from the same movie.

Step 3. If you have many pacbio runs you must merge the outputted .bam files with samtools merge*

Step 4. Index the merged .bam file with pbindex* which is a part of the pbbam (conda / github)

Step 5. Align you pacbio reads with pbalign (conda / github) and output the file as *.bam despite what is said on the github page, version >0.4 has new features and allows this.

Step 6. Index the outputted .bam file with pbindex*.

Step 7. Index your reference with samtools faidx

Step 8. Run quiver or arrow (conda / github)

Detailed hints for pbalign, quiver and arrow options can be found here: LINK

2
Entering edit mode
18 months ago
renhaol ▴ 50

Hi, I solved this problem by using a previous version of smrtanalysis, smrtanalysis v2.3.0p5, which includes the pbalign and quiver.