Question: Fastx-Toolkit: Fastx_Barcode_Splitter
gravatar for k.nirmalraman
5.5 years ago by
k.nirmalraman980 wrote:

Dear All,

I have been using fastx_barcode_splitter to demultiplex my reads. Today I found that there are some of the reads that did not match to any barcodes we used in the experiment. I took a closer look and I found the problem of reads not sorted because there was atleast one base in the beginning of the read.

Example Fasta Sequenece:

 >HWI-ST863:238:C20G3ACXX:4:1204:18858:57161 1:N:0:AAACAAAA



This is however a match, but the read is not sorted into corresponding barcode file.

The command I use is the following:

cat <file_name> | --bcfile mybarcodes.txt --bol --mismatches 3   --prefix code_ --suffix "_1" > code_1.stats

I tried option --partial, but this is super slow and I almost had to kill the process and did not improve code splitting efficiently.

Can some one help me understand if there is any better way to manage this? is there anyother splitter that can be used with ease and easily be integrated with some existing pipeline?

barcode split • 3.1k views
ADD COMMENTlink modified 3.3 years ago by pingEde20 • written 5.5 years ago by k.nirmalraman980

Is there any known explanation for that extra nucleotide at the beginning of your reads?

ADD REPLYlink written 5.5 years ago by Manu Prestat3.9k

I am unable to come up with any but barcode contamination in synthesis/purification?

ADD REPLYlink modified 5.5 years ago • written 5.5 years ago by k.nirmalraman980
gravatar for Carlos Borroto
5.5 years ago by
Carlos Borroto1.8k
Washington Metropolitan Area
Carlos Borroto1.8k wrote:

ngs-tools, a tool I wrote, supports this use case.

The easiest way to try ngs-tools is to install it using pip(preferably inside a virtualenv):

$ pip install ngs-tools

Please see the help for command split-by-barcode:

ngs-tools split-by-barcode --help

I adapted the code in this gist for this command. This code uses Levenshtein distance to look for partial matches, by default max distance is 3. BTW, <barcode_file> is a tab delimited file with two columns, "barcode_id" and "barcode_seq". For example:


This tool also has a wrapper for Galaxy

ADD COMMENTlink written 5.5 years ago by Carlos Borroto1.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2361 users visited in the last hour