Hi all,
I am trying to demultiplex an Illumina run in which I have introduced barcode sequences in a custom configuration:
R1 - 6 bp (barcode 1) + 144 bp
R2 - 6 bp (barcode 2) + 144 BP
index read - i7 (8 bp)
I have used bcl2fastq to introduce the sequences of each barcode (barcode1+barcode2+i7) in the header of the read.
Bcl2fastq options:
Read1StartFromCycle,7,,,,,,
Read2StartFromCycle,7,,,,,,
Read1UMILength,6,,,,,,
Read2UMILength,6,,,,,,
Read1UMIStartFromCycle,1,,,,,,
Read2UMIStartFromCycle,1,,,,,,
Fastq line example:
@M01913:344:000000000-CGVBP:1:1101:17206:1578:**AACGGT**+**TCCTTA** 1:N:0:**CTAAGTCATG**
CTTAACCCCTCCTCCCAGAGACCCCAGTTGCAAACCAGACCTCAGGCGGCTCATAGGGCACCACCACACTATGTCGAAAAGCGTTTCTGTCATCCAAATACTCCACACGCAAATTTCCTTCCACTCGGATAAGATGCTGAGGAGG
+
CCFFFFGGGGGGGGGGHHGHGHGHGGGHHHHHHHHGHHHGHHHGHHHGGGGGHHHHHHHGGGHHGHHHGHHHHHHHHGGGHHGGGGHHHHHHHHHHHHHHHHHHHHHGGGGGGHHHHHHHHHHHHHGGGGGGHHHHHHHFHGFFG
However, I can't find any suitable tool to further demultiplex these reads into individual fastq files corresponding to each unique barcode combination. Ideally, I would provide a sample sheet containing a sampleID and unique barcode combination (barcode1+barcode2+i7), and get individual fastq files named with the sampleID provided.
Any help/comments would be highly appreciated!
You can try using
demuxbyname.sh
from BBMap suite. Run the program without any options and look at the in-line help. Give it a try and see if you can figure this out. Otherwise I will do some more testing later when I have time.A second suggestion is omit moving the inline barcodes to fastq headers by removing
bcl2fastq
options you listed above.Then use
sabre
(https://github.com/najoshi/sabre ) to demultiplex the data.This will definitely work.
Hi!
Thanks so much for the response. I'll give it a try to demuxbyname.sh but my feeeling is that I'll have to first re-format the headers to get all barcodes in the right position, rather than how they are at the moment:
@M01913:344:000000000-CGVBP:1:1101:17206:1578:AACGGT+TCCTTA 1:N:0:CTAAGTCATG
Possibly reformating to something like this would potentially work:
@M01913:344:000000000-CGVBP:1:1101:17206:1578 1:N:0:CTAAGTCATG+AACGGT+TCCTTA
I am not super familiar with awk/sed so I wouldn't know how to easily reformat the header in that sense. Any comments would be super welcome!
Unfortunately, sabre only supports the same barcode in forward and reverse reads for paired-end sequencing so that wouldn't work in this case (my R1 and R2 barcodes are always different).
Thanks!