Pacbio de-novo assembly
2
0
Entering edit mode
8 weeks ago
Ap1438 ▴ 50

Hi,
Recently I got Pacbio Hifi reads generated using CCS mode of a plant whole genome de-novo assembly.
I received 2 file types from the sequencing facility.
Fastq.gz and Bam file
I am getting confused in two places.

  1. From my understanding i learned that Pacbio sequencing output is in bam format by default. But the file that i received i feel it is not raw file but produced by CCS software using this command " ccs movie.subreads.bam movie.ccs.bam ".
  2. I have used hifiasm to generate primary contig using the fastq file. Now i want to align the HiFi reads back to the assemblies and filter contigs showing a read depth close to 0, as well as aligning contigs to plant mitochondrial and chloroplast genome sequences to detect organellar contigs. I am confused which alliner to use . I have came across pbalign and minimap2 for the purpose.

I am new to working with Pacbio data. Please let me know if you have any suggestions. Note- WGS Hifi data(Not RNA- seq data).

Pacbio minimap2 de-novo assembly alligner • 333 views
ADD COMMENT
3
Entering edit mode
8 weeks ago
gconcepcion ▴ 350

pbalign is outdated - pbmm2 is it's spiritual successor and is really just a wrapper for minimap2 https://github.com/PacificBiosciences/pbmm2

pbmm2 can align either a bam OR a fasta/fastq to a reference genome

Is your CCS bam file suffixed with *.ccs.bam or *.hifi_reads.bam? If the latter, then it is the HiFi subset of CCS reads, if the former, then it is likely to be ALL CCS reads in the dataset, not filtered for Q20 reads.

Working from just the fasta/fastq should be fine for what you're trying to do. There is often additional information in the BAM files necessary for certain analyses (kinetics/basemods) but for your purpose working with the fasta/fastqs should be sufficient.

ADD COMMENT
0
Entering edit mode
8 weeks ago

Don't worry about the raw BAM file, it has no data that is useful for you at this stage. The CSS reads are what you need.

Use minimap2 to align your reads, you will see it has several modes of presets:

   Preset:
    -x STR       preset (always applied before other options; see minimap2.1 for details) []

                 - map-pb/map-ont - PacBio CLR/Nanopore vs reference mapping
                 - map-hifi - PacBio HiFi reads vs reference mapping
                 - ava-pb/ava-ont - PacBio/Nanopore read overlap
                 - asm5/asm10/asm20 - asm-to-ref mapping, for ~0.1/1/5% sequence divergence
                 - splice/splice:hq - long-read/Pacbio-CCS spliced alignment
                 - sr - genomic short-read mapping
ADD COMMENT

Login before adding your answer.

Traffic: 851 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6