Question

Pacbio de-novo assembly

0

Entering edit mode

17 months ago

Ap1438 ▴ 50

Hi,
Recently I got Pacbio Hifi reads generated using CCS mode of a plant whole genome de-novo assembly.
I received 2 file types from the sequencing facility.
Fastq.gz and Bam file
I am getting confused in two places.

From my understanding i learned that Pacbio sequencing output is in bam format by default. But the file that i received i feel it is not raw file but produced by CCS software using this command " ccs movie.subreads.bam movie.ccs.bam ".
I have used hifiasm to generate primary contig using the fastq file. Now i want to align the HiFi reads back to the assemblies and filter contigs showing a read depth close to 0, as well as aligning contigs to plant mitochondrial and chloroplast genome sequences to detect organellar contigs. I am confused which alliner to use . I have came across pbalign and minimap2 for the purpose.

I am new to working with Pacbio data. Please let me know if you have any suggestions. Note- WGS Hifi data(Not RNA- seq data).

Pacbio minimap2 de-novo assembly alligner • 2.1k views

ADD COMMENT • link updated 14 months ago by gconcepcion ▴ 410 • written 17 months ago by Ap1438 ▴ 50

score 3 · Answer 1 · 2022-11-28

3

Entering edit mode

17 months ago

gconcepcion ▴ 410

pbalign is outdated - pbmm2 is it's spiritual successor and is really just a wrapper for minimap2 https://github.com/PacificBiosciences/pbmm2

pbmm2 can align either a bam OR a fasta/fastq to a reference genome

Is your CCS bam file suffixed with *.ccs.bam or *.hifi_reads.bam? If the latter, then it is the HiFi subset of CCS reads, if the former, then it is likely to be ALL CCS reads in the dataset, not filtered for Q20 reads.

Working from just the fasta/fastq should be fine for what you're trying to do. There is often additional information in the BAM files necessary for certain analyses (kinetics/basemods) but for your purpose working with the fasta/fastqs should be sufficient.

ADD COMMENT • link 17 months ago by gconcepcion ▴ 410

0

Entering edit mode

So, the command in pbmm2 is little bit confusing. It has this --present mode with option CCS.

--preset                  STR   Set alignment mode. See below for preset parameter details. Valid choices: (SUBREAD,
                              CCS, ISOSEQ, UNROLLED). [SUBREAD]

Should i use the CCS mode here or just do normal indexing and alignment.

pbmm2 index [options] <ref.fa|xml> <out.mmi>
pbmm2 align [options] <ref.fa|xml|mmi> <in.bam|xml|fa|fq|gz|fofn> [out.aligned.bam|xml]

Or

pbmm2 index [options] <ref.fa|xml> <out.mmi> --present CCS

pbmm2 align [options] <ref.fa|xml|mmi> <in.bam|xml|fa|fq|gz|fofn> [out.aligned.bam|xml] --present  CCS

Because i am using hifi reads to generate the primary assembly.
And now want to map the hifi reads back to the assembly to remove contigs with 0 reads mapped.

I am confused because i am using CCS generated hifi reads. So, what should i do.

ADD REPLY • link 14 months ago by Ap1438 ▴ 50

0

Entering edit mode

You must be using an older version of pbmm2. In recent versions the pbmm2 align --help output shows:

  --preset                   STR    Set alignment mode. See below for preset parameter details. Valid choices:
                            (SUBREAD, CCS, HIFI, ISOSEQ, UNROLLED). [SUBREAD]

HIFI reads are just a subset of CCS reads so the same preset option will work here. The default is SUBREAD. If you don't specify anything, the SUBREAD preset lacks the -u parameter which disables homopolymer compression, meaning that homopolymers will be compressed, which will probably have minimal/subtle impact to your alignments so running without any preset probably won't make much of a difference.

Alignment modes of --preset:
    SUBREAD     : -k 19 -w 10    -o 5 -O 56 -e 4 -E 1 -A 2 -B 5 -z 400 -Z 50  -r 2000   -L 0.5 -g 5000
    CCS or HiFi : -k 19 -w 10 -u -o 5 -O 56 -e 4 -E 1 -A 2 -B 5 -z 400 -Z 50  -r 2000   -L 0.5 -g 5000
    ISOSEQ      : -k 15 -w 5  -u -o 2 -O 32 -e 1 -E 0 -A 1 -B 2 -z 200 -Z 100 -r 200000 -L 0.5 -g 2000 -C 5 -G 200000
    UNROLLED    : -k 15 -w 15    -o 2 -O 32 -e 1 -E 0 -A 1 -B 2 -z 200 -Z 100 -r 2000   -L 0.5 -g 10000

Also, Indexing isn't necessary. Just do pbmm2 align ...

ADD REPLY • link 14 months ago by gconcepcion ▴ 410

score 0 · Answer 2 · 2022-11-28

Don't worry about the raw BAM file, it has no data that is useful for you at this stage. The CSS reads are what you need.

Use minimap2 to align your reads, you will see it has several modes of presets:

   Preset:
    -x STR       preset (always applied before other options; see minimap2.1 for details) []

                 - map-pb/map-ont - PacBio CLR/Nanopore vs reference mapping
                 - map-hifi - PacBio HiFi reads vs reference mapping
                 - ava-pb/ava-ont - PacBio/Nanopore read overlap
                 - asm5/asm10/asm20 - asm-to-ref mapping, for ~0.1/1/5% sequence divergence
                 - splice/splice:hq - long-read/Pacbio-CCS spliced alignment
                 - sr - genomic short-read mapping