collapsing pacbio reads into a single sequence
1
0
Entering edit mode
18 months ago
Sara ▴ 220

we have used pacbio technology to sequence our samples (which is only one gene). after demultiplexing and using ccs tool, now we have one ccs.bam file per sample. but we would like to collapse all the sequences (from the same sample which are now in the same .ccs.bam file) into one sequence which will be used for the downstream analysis. do you have any experience about the tools and method that we can use.

pacbio • 1.1k views
0
Entering edit mode

Are you sure that is correct? Documentation for the ccs tool (LINK) seems to indicate that you should get HiFi consensus reads.

0
Entering edit mode

GenoMax yes you will get the consensus sequence. that is in bam format. if you convert it to fasta you will see there are multiple entries. meaning there are many sequences. my goal is to get only one (consensus) sequence.

0
Entering edit mode

But the schematic seems to show a single consensus being called. Guess that is not correct? You did use the subreads.bam file as input, correct?

0
Entering edit mode

GenoMax yes subreads.bam is the input for ccs tool.

2
Entering edit mode
18 months ago
Tm ★ 1.1k

First you will have to extract reads in fastq format from bam file using samtools fastq or similar tool. For eg.

samtools fastq -0 sample_output -@ 30 input.bam

here, -0 is output fastq file name and -@ is number of threads

Once you have fastq file you can use tools like canu which can perform pacbio reads correction, trimming and assembly, all 3 things in single command. For eg.

canu -p sample -d sample_canu_output genomeSize=500m -pacbio-raw  sample_output.fastq


here, -p <assembly-prefix> , -d <assembly-directory> , genomeSize=<number>[g|m|k]

However assembly using any of the tools doesn't guarantee output in form of single scaffold as it depend on number of things.