Question: Demultiplex Illumina run using custom index configuration
gravatar for alba.rodriguezmeira
7 months ago by
alba.rodriguezmeira0 wrote:

Hi all,

I am trying to demultiplex an Illumina run in which I have introduced barcode sequences in a custom configuration:

R1 - 6 bp (barcode 1) + 144 bp
R2 - 6 bp (barcode 2) + 144 BP

index read - i7 (8 bp)

I have used bcl2fastq to introduce the sequences of each barcode (barcode1+barcode2+i7) in the header of the read.

Bcl2fastq options:


Fastq line example:

@M01913:344:000000000-CGVBP:1:1101:17206:1578:**AACGGT**+**TCCTTA** 1:N:0:**CTAAGTCATG**




However, I can't find any suitable tool to further demultiplex these reads into individual fastq files corresponding to each unique barcode combination. Ideally, I would provide a sample sheet containing a sampleID and unique barcode combination (barcode1+barcode2+i7), and get individual fastq files named with the sampleID provided.

Any help/comments would be highly appreciated!

sequencing next-gen • 340 views
ADD COMMENTlink modified 7 months ago by genomax89k • written 7 months ago by alba.rodriguezmeira0

You can try using from BBMap suite. Run the program without any options and look at the in-line help. Give it a try and see if you can figure this out. Otherwise I will do some more testing later when I have time.


Written by Brian Bushnell
Last modified Jan 7, 2020

Description:  Demultiplexes sequences into multiple files based on their names,
substrings of their names, or prefixes or suffixes of their names.
Allows unlimited output files while maintaining only a small number of open file handles.

Usage: in=<file> in2=<file2> out=<outfile> out2=<outfile2> names=<string,string,string...>

Alternately: in=<file> out=<outfile> delimiter=whitespace prefixmode=f
This will demultiplex by the substring after the last whitespace. in=<file> out=<outfile> length=8 prefixmode=t
This will demultiplex by the first 8 characters of read names. in=<file> out=<outfile> delimiter=: prefixmode=f
This will split on colons, and use the last substring as the name; useful for
demuxing by barcode for Illumina headers in this format:
@A00178:73:HH7H3DSXX:4:1101:13666:1047 1:N:0:ACGTTGGT+TGACGCAT
ADD REPLYlink modified 7 months ago • written 7 months ago by genomax89k

A second suggestion is omit moving the inline barcodes to fastq headers by removing bcl2fastq options you listed above.

Then use sabre ( ) to demultiplex the data.

This will definitely work.

ADD REPLYlink modified 7 months ago • written 7 months ago by genomax89k


Thanks so much for the response. I'll give it a try to but my feeeling is that I'll have to first re-format the headers to get all barcodes in the right position, rather than how they are at the moment:

@M01913:344:000000000-CGVBP:1:1101:17206:1578:AACGGT+TCCTTA 1:N:0:CTAAGTCATG

Possibly reformating to something like this would potentially work:

@M01913:344:000000000-CGVBP:1:1101:17206:1578 1:N:0:CTAAGTCATG+AACGGT+TCCTTA

I am not super familiar with awk/sed so I wouldn't know how to easily reformat the header in that sense. Any comments would be super welcome!

Unfortunately, sabre only supports the same barcode in forward and reverse reads for paired-end sequencing so that wouldn't work in this case (my R1 and R2 barcodes are always different).


ADD REPLYlink written 7 months ago by alba.rodriguezmeira0
gravatar for genomax
7 months ago by
United States
genomax89k wrote:

You should be able to use this way. I made a small dummy file.

$ more dem.fq
@HISEQ:267:CAAV9ANXX:4:1101:10050:2218:AACGGT+TCCTTA 1:N:0:AGTCAA
@HISEQ:267:CAAV9ANXX:4:1101:10050:2219:AATTGT+TCGGTA 1:N:0:AGTCAA
@HISEQ:267:CAAV9ANXX:4:1101:10050:2220:TTCGGT+GGCTTA 1:N:0:AGTCAA

$ -Xmx5g in=dem.fq out=out_%.fq delimiter=: column=8

$ ls out*

This method has an unfortunate effect of introducing a space in the filename because we are using the column 8 (where your UMI are) and that has a space after the index sequences.

You can take care of that using this loop that will remove spaces in the file names at the end

$ find . -type f -name "* *.fq" -exec bash -c 'mv "$0" "${0// /_}"' {} \;
ADD COMMENTlink modified 7 months ago • written 7 months ago by genomax89k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1873 users visited in the last hour