Question: demultiplex a dataset when you have barcodes as a separate fastq
gravatar for IP
2.0 years ago by
Denmark/University of Copenagen
IP590 wrote:

Hi Biostars:

I have receive raw sequencing data from a collaborator, and the data is not demultiplexed. What I usually see on the fastq files that I have to analyse and demultiplex is the following:

Barcode + sequence

And then. one can use a software like barcode_splitter or from the FourCseq package to demultiplex the samples.

However, now I have three fastq files, example:

One for the left reads:

@JLK5VL1:840:HLKVHBCXX:1:1101:1489:2056 1:N:0:

One for the right reads:

@JLK5VL1:840:HLKVHBCXX:1:1101:1489:2056 3:N:0:

And, a last file with the barcode associated to the above read pair, note that the header is the same for the three entries of the fastq file.

@JLK5VL1:840:HLKVHBCXX:1:1101:1489:2056 2:N:0:

Of course, I have a file with the barcode associated to each sample:

sample_6  GAGTGG    NA

I have try to look for software to demultiplex a fastq file when you have the data in this format (left_read.fastq, right_read.fastq and barcodes.fastq), however, I have not been able to find anything. I feel that I could solve this with python using pysam, but, since my colaborator is not a bioinformatician, I guess that there must be a tool for handling this.

So, long story short: is there a tool for demultiplexing datasets that are in the format: left_reads.fastq, right_reads.fastq, barcodes.fastq

best, and thanks for reading

ADD COMMENTlink modified 15 months ago by genomax70k • written 2.0 years ago by IP590

Ask them to have whoever did the sequencing demultiplex the files. The three files you're getting are the output of the demultiplexing software, but whoever ran it explicitly requested that output, since the default would be to demultiplex everything into separate files (i.e., what you and everyone else in the world actually wants). Don't waste time on this, have the person who produced the files do so correctly.

ADD REPLYlink written 2.0 years ago by Devon Ryan91k

If that is the answer, I assume that they have done something wrong, this is not a standard format for providing the data, right?

Whatever your answer is, thanks for repplying

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by IP590

There have been variations of Qiime (metagenomics) pipeline over the years where the barcode was expected to be in a separate file (which is what you have). Qiime package may have a utility program to demultiplex this data. Take a look there.

Provider has not done "something wrong" (especially if this was what was requested) but they can easily fix this (provided this is not an old dataset) and give you properly demultiplexed files.

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by genomax70k

Correct, the specified the --create-fastq-for-index-reads option and apparently didn't use a sample sheet. They need to just not specify that option and to use a sample sheet. Simply email those two sentences to them.

ADD REPLYlink written 2.0 years ago by Devon Ryan91k
gravatar for Charles Plessy
2.0 years ago by
Charles Plessy2.7k
Charles Plessy2.7k wrote:

If you do not find a program for demultiplexing three files at a time, perhaps you can append the barcodes at the beginning of the "left" reads, and then run a paired-end demultiplexer such as TagDust 2?

For an example on how to run TagDust 2, you can look at my tutorial on GitHub.

For how to paste the barcodes, maybe you can follow the example below:

$ cat toto.fq 

$ perl -nE '++$i % 2 == 0 ? print : say ""' toto.fq | paste -d '' - toto.fq 
ADD COMMENTlink written 2.0 years ago by Charles Plessy2.7k
gravatar for lelle
2.0 years ago by
lelle800 wrote:

I agree with Devon Ryan that it is probably easiest to get the data in the format you want from your sequencing provider, If that is not possible, you can use Flexbar which supports separate barcode reads.

ADD COMMENTlink written 2.0 years ago by lelle800

I only cursorily looked at flexbar page. Are you sure it can handle the situation here (where the barcode reads are in a separate file)? It does not seem to be the case per my quick look.

ADD REPLYlink written 2.0 years ago by genomax70k

Yes, with the -br option. I am not sure if it works when you have to barcode read files.

ADD REPLYlink written 2.0 years ago by lelle800
gravatar for genomax
15 months ago by
United States
genomax70k wrote:

A: Demultiplexing Illumina data has a solution for this. I am posting it here to create a cross-reference.

ADD COMMENTlink written 15 months ago by genomax70k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 542 users visited in the last hour