Question: How to split BAM files by samples from PacBio
gravatar for bwczech
5 months ago by
bwczech60 wrote:


I have sequences from 6 bacterias (E. Coli and Salmonella). 6 samples have been sequenced using PacBio on 4 movies so I have 3 bax files for every movie (12 in total).

I converted bax file to bam using Bax2Bam and then I used Lima to do demultplexing (using a set of about 330 barcodes in fasta). The output of Lima = 4 bam files (= number of movie).

The question is, how can I filter/split that file by sample? I would like to get 6 file - each for every sample.

Thank you in advance.

pacbio bam bax • 360 views
ADD COMMENTlink modified 5 months ago by jharting0 • written 5 months ago by bwczech60

Demultiplexing is usually splitting by sample. Were the right barcodes used?

ADD REPLYlink written 5 months ago by WouterDeCoster38k

Yes. In my fasta with barcodes there are about 300 barcodes...

ADD REPLYlink written 5 months ago by bwczech60

What is the exact lima command line you used to split the barcodes file? did you use the --split-bam parameter?

ADD REPLYlink modified 5 months ago • written 5 months ago by gconcepcion60

Yes, I tried with that function and lima created about 900 bam files (with barcode prefix) and I do not know how can I identified my samples (because of plenty BAM files).

ADD REPLYlink written 5 months ago by bwczech60
gravatar for jharting
5 months ago by
jharting0 wrote:

Try using the split-bam-named option to label the outputs by the headers from the barcode set. You can also use the option --peek-guess to filter out undesirable barcode pairs. This should reduce the number of output files. Look in the file "lima.guess" for information on which barcode pairs were inferred from your inputs.

ADD COMMENTlink written 5 months ago by jharting0

Ok, Thank you, Now I understand. I used that parameters for lima, but I do not understand sth. I used 3 barcodes for 1 run (on 1 rune we have 3 samples sequenced), but lima produced 6 bams – every combination of my barcodes (025 forward with 025 forward; 025 forward with 0032 forward; 0032 forward with 0032 forward etc.). Could u tell me why I have 6 files instead of 3? How can I identify my samples? Thank you in advance.

ADD REPLYlink written 5 months ago by bwczech60

You should also be using the option --same, assuming your barcodes are the same on both ends of the insert (which is the only possibility when adding barcodes to sheared libraries). This option will filter out any read that has different barcodes on either end of the insert. There are various reasons why you might see asymmetric/different barcodes on inserts even in a library prepped with the same barcode on both ends -- read error, small levels of contamination -- but the counts of the asymmetric/different reads should be much lower relative to the symmetric/expected read counts.

ADD REPLYlink written 5 months ago by jharting0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1125 users visited in the last hour