How to demultiplex PacBio from CCS.h5 or fastq
2
0
Entering edit mode
9.2 years ago

I have PacBio CCS.h5 and the corresponding fasta and fastq files and I would like to demultiplex them. Does anyone know of how this can be done in the absence of bas.h5 files.

Thanks for your help!

Mandy

sequence • 4.0k views
ADD COMMENT
2
Entering edit mode
9.1 years ago
Felix Francis ▴ 600

You can use HMMer package to identify barcodes. Start and finish barcode HMMs can be probabilistically pinned (independently) to the start_pos and end_pos of the reads where the barcodes are supposed to occur.

The two ends can then be considered together by adding their log-likelihood scores of the start_pos and end_pos HMM hits pertaining to the different barcode combinations that were used for multiplexing (your hypothesis i.e. the barcode combinations that were actually used).

ADD COMMENT
0
Entering edit mode
4.0 years ago

You can easily extract barcode sequences with below commands with bam files, but this will only applicable for exact barcode matches not suitable when there are base errors in the barcode sequences.

example:

forward barcode = "CAAGCTCACT"

sequence between barcodes = ".*"

reverse complementary barcode = "GCACGACTTG"

or = "|"

reverse barcode = "CAAGTCGTGC"

sequence between barcodes = ".*"

forward complementary barcode = "AGTGAGCTTG"

samtools view -H pacbio_reads.ccs.bam > pacbio_reads.ccs-header.sam
samtools view pacbio_reads.ccs.bam | grep 'CAAGCTCACT.*GCACGACTTG\|CAAGTCGTGC.*AGTGAGCTTG' | cat pacbio_reads.ccs-header.sam - | samtools view -Sb - > pacbio_reads.ccs.demultiplex.bam
ADD COMMENT

Login before adding your answer.

Traffic: 2067 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6