Pacbio de-novo assembly
2
0
Entering edit mode
10 weeks ago
Ap1438 ▴ 50

Hi,
Recently I got Pacbio Hifi reads generated using CCS mode of a plant whole genome de-novo assembly.
I received 2 file types from the sequencing facility.
Fastq.gz and Bam file
I am getting confused in two places.

1. From my understanding i learned that Pacbio sequencing output is in bam format by default. But the file that i received i feel it is not raw file but produced by CCS software using this command " ccs movie.subreads.bam movie.ccs.bam ".
2. I have used hifiasm to generate primary contig using the fastq file. Now i want to align the HiFi reads back to the assemblies and filter contigs showing a read depth close to 0, as well as aligning contigs to plant mitochondrial and chloroplast genome sequences to detect organellar contigs. I am confused which alliner to use . I have came across pbalign and minimap2 for the purpose.

I am new to working with Pacbio data. Please let me know if you have any suggestions. Note- WGS Hifi data(Not RNA- seq data).

Pacbio minimap2 de-novo assembly alligner • 397 views
3
Entering edit mode
9 weeks ago
gconcepcion ▴ 350

pbalign is outdated - pbmm2 is it's spiritual successor and is really just a wrapper for minimap2 https://github.com/PacificBiosciences/pbmm2

pbmm2 can align either a bam OR a fasta/fastq to a reference genome

Is your CCS bam file suffixed with *.ccs.bam or *.hifi_reads.bam? If the latter, then it is the HiFi subset of CCS reads, if the former, then it is likely to be ALL CCS reads in the dataset, not filtered for Q20 reads.

Working from just the fasta/fastq should be fine for what you're trying to do. There is often additional information in the BAM files necessary for certain analyses (kinetics/basemods) but for your purpose working with the fasta/fastqs should be sufficient.

0
Entering edit mode

So, the command in pbmm2 is little bit confusing. It has this --present mode with option CCS.

--preset                  STR   Set alignment mode. See below for preset parameter details. Valid choices: (SUBREAD,


Should i use the CCS mode here or just do normal indexing and alignment.

pbmm2 index [options] <ref.fa|xml> <out.mmi>
pbmm2 align [options] <ref.fa|xml|mmi> <in.bam|xml|fa|fq|gz|fofn> [out.aligned.bam|xml]


Or

pbmm2 index [options] <ref.fa|xml> <out.mmi> --present CCS

pbmm2 align [options] <ref.fa|xml|mmi> <in.bam|xml|fa|fq|gz|fofn> [out.aligned.bam|xml] --present  CCS


Because i am using hifi reads to generate the primary assembly.
And now want to map the hifi reads back to the assembly to remove contigs with 0 reads mapped.

I am confused because i am using CCS generated hifi reads. So, what should i do.

0
Entering edit mode
9 weeks ago

Don't worry about the raw BAM file, it has no data that is useful for you at this stage. The CSS reads are what you need.

Use minimap2 to align your reads, you will see it has several modes of presets:

   Preset:
-x STR       preset (always applied before other options; see minimap2.1 for details) []

- map-pb/map-ont - PacBio CLR/Nanopore vs reference mapping
- map-hifi - PacBio HiFi reads vs reference mapping
- ava-pb/ava-ont - PacBio/Nanopore read overlap
- asm5/asm10/asm20 - asm-to-ref mapping, for ~0.1/1/5% sequence divergence
- splice/splice:hq - long-read/Pacbio-CCS spliced alignment
- sr - genomic short-read mapping