demultiplexing tool for dual-indexed paired-end illumina libraries
2
3
Entering edit mode
8.2 years ago
Floydian_slip ▴ 170

Hi, What are the some of the tools out there to demultiplex dual-indexed illumina libraries where different combinations of i7 and i5 indices are used for paired-end data? I have already tried fastq_multx and encountered an error.

Thanks!

demultiplexing • 8.4k views
ADD COMMENT
2
Entering edit mode
8.2 years ago
GenoMax 141k

What sort of data do you have? Fastq files or BCL files?

Take a look at the demuxbyname.sh tool from BBMap suite.

With BCL files you could look at IlluminaBasecallsToFastq from Picard.

ADD COMMENT
0
Entering edit mode

I have one set of paired-end multiplexed fastq file (after converting the BCL files to one giant paired-end fastq).

ADD REPLY
0
Entering edit mode

Then use demuxbyname.sh from BBMap.

$ demuxbyname.sh in=r#.fq out=out_%_#.fq prefixmode=f names=GGACTCCT+GCGATCTA,TAAGGCGA+TCTACTCT,...
outu=filename

"Names" can also be a text file with one barcode per line (in exactly the format found in the read header). You do have to include all of the expected barcodes, though.

In the output filename, the "%" symbol gets replaced by the barcode; in both the input and output names, the "#" symbol gets replaced by 1 or 2 for read 1 or read 2. It's optional, though; you can leave it out for interleaved input/output, or specify in1=/in2=/out1=/out2= if you want custom naming.

ADD REPLY
1
Entering edit mode
8.2 years ago
Gabriel R. ★ 2.9k

You can use our maximum-likelihood demultiplexing tool, read our paper here:

http://www.ncbi.nlm.nih.gov/pubmed/25359895

the website with the software is here: https://grenaud.github.io/deML/

Hope this helps, contact me if you have trouble running it.

ADD COMMENT
0
Entering edit mode

Hi gabriel, I tried deML and ran into an error. My barcodes file is:

Index1 Index2 Name

CCCAACCT CTAATCGA NA12877_A1 CACCACAC CTAATCGA NA12877_A2 GAAACCCA CTAATCGA NA12877_A3 TGTGACCA CTAATCGA NA12877_B1 AGGGTCAA CTAATCGA NA12877_B2 AGGAGTGG CTAATCGA NA12877_B3 CCCAACCT CTAGAACA NA12878_A1 CACCACAC CTAGAACA NA12878_A2 GAAACCCA CTAGAACA NA12878_A3 TGTGACCA CTAGAACA NA12878_B1 AGGGTCAA CTAGAACA NA12878_B2 AGGAGTGG CTAGAACA NA12878_B3

I ran the command: $ deML -i index.txt -f Undetermined_S0_L001_R1_001.fastq.gz -r Undetermined_S0_L001_R2_001.fastq.gz

The error was: If fastq is used, the forward read must be specified

If the indices are already in index.txt, what is contained in -if1 and -if2? I don't have any more fastq files from Illumina. Thanks!

ADD REPLY
0
Entering edit mode

The "-if1" and "-if2" are the fastq for the index1 and index2 from the reads respectively.

BTW, if you want to simplify your processing, I suggest transforming your BCL directly to aligned BAM:

https://github.com/grenaud/BCL2BAM2FASTQ

Recent versions of samtools provide commands for transforming to fastq if need be.

ADD REPLY
0
Entering edit mode

Hi Gabriel, I don't have these fastq files, can I quickly create them using the indices that I have from the sampleSheet? Thanks!

ADD REPLY
0
Entering edit mode

No these index files are the indices from your reads, not your samples, are your indices from the reads in the definition lines (starts with @) in Undetermined_S0_L001_R1_001.fastq.gz or do you have different files besides those?

ADD REPLY
0
Entering edit mode

Based on this other thread it appears to be the case: demultiplexing Illumina output with fastq_multx

ADD REPLY
0
Entering edit mode

Ok, well then, @nbhardwaj, you can copy these indices in their own fastq files and add some quality scores that represent an average error rate?

ADD REPLY

Login before adding your answer.

Traffic: 1741 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6