Fastx-Toolkit: Fastx_Barcode_Splitter
2
0
Entering edit mode
10.6 years ago
k.nirmalraman ★ 1.1k

Dear All,

I have been using fastx_barcode_splitter to demultiplex my reads. Today I found that there are some of the reads that did not match to any barcodes we used in the experiment. I took a closer look and I found the problem of reads not sorted because there was atleast one base in the beginning of the read.

Example Fasta Sequenece:

 >HWI-ST863:238:C20G3ACXX:4:1204:18858:57161 1:N:0:AAACAAAA
 TACTTACCTACTTCCGCTGGTCATCCTGCGCCAATTTGATGTGTGTGGTTTTTAATTGAGCTGTATAATCTGTTTATTTTGAGGCCAAAAAAAAAAAA

Barcode: ACTTACCTACTT

TACTTACCTACTTCCGCTGGTCATCCTGCGCCAATTTGATGTGTGTGGTTTTTAATTGAGCTGTATAATCTGTTTATTTTGAGGCCAAAAAAAAAAAA
_ACTTACCTACTT

This is however a match, but the read is not sorted into corresponding barcode file.

The command I use is the following:

cat <file_name> | fastx_barcode_splitter.pl --bcfile mybarcodes.txt --bol --mismatches 3   --prefix code_ --suffix "_1" > code_1.stats

I tried option --partial, but this is super slow and I almost had to kill the process and did not improve code splitting efficiently.

Can some one help me understand if there is any better way to manage this? is there anyother splitter that can be used with ease and easily be integrated with some existing pipeline?

barcode split • 6.3k views
ADD COMMENT
0
Entering edit mode

Is there any known explanation for that extra nucleotide at the beginning of your reads?

ADD REPLY
0
Entering edit mode

I am unable to come up with any but barcode contamination in synthesis/purification?

ADD REPLY
2
Entering edit mode
10.6 years ago
Carlos Borroto ★ 2.1k

ngs-tools, a tool I wrote, supports this use case.

The easiest way to try ngs-tools is to install it using pip(preferably inside a virtualenv):

$ pip install ngs-tools

Please see the help for command split-by-barcode:

ngs-tools split-by-barcode --help

I adapted the code in this gist for this command. This code uses Levenshtein distance to look for partial matches, by default max distance is 3. BTW, <barcode_file> is a tab delimited file with two columns, "barcode_id" and "barcode_seq". For example:

B01    ACTTACCTACTT

This tool also has a wrapper for Galaxy

ADD COMMENT

Login before adding your answer.

Traffic: 1879 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6