Demultiplexing based on dual indices in headers while allowing 1 mismatch to each index
1
0
Entering edit mode
4.4 years ago
Rezenman • 0

Hey all, I am looking for a tool that will help me demultiplexe my Novaseq samples by two dual indices in the headers. Since I have designed my indices such that the minimum hamming distance will be 3 I want to allow one mismatch per index while demultiplexing in order to salvage as many reads a possible. Up to now, I have used demuxbyname from BBmap but it does not allow any mismatches. Any help will be appreciated :)

Fastq header example: @A00929:83:HL75TDRXX:1:2101:13431:1047 2:N:0:AGGCAGAA+NCTCTCCG

next-gen sequencing • 4.5k views
ADD COMMENT
0
Entering edit mode

This is easily done by bcl2fastq when the data is originally demultiplexed. You may want to ask your sequence provider about this next time.

ADD REPLY
0
Entering edit mode

Hey, thanks for the reply. According to my pipeline, I am usually working on the fastq files before demultiplexing (trimming, quality control, etc.), and demultiplex only in the last step, hence(and I forgot to mention), I am looking for a tool that can work with fastq files. Thanks

ADD REPLY
3
Entering edit mode
4.4 years ago
Gabriel R. ★ 2.9k

We have published a program called deML:

https://grenaud.github.io/deML/

See out paper:

Renaud G, Stenzel U, Maricic T, Wiebe V, Kelso J. deML: robust demultiplexing of Illumina sequences using a likelihood-based approach. Bioinformatics. 2015;31(5):770-772. doi:10.1093/bioinformatics/btu719

It handles partial matches and uses maximum-likelihood to assign reads to the different samples. It also reports which unassigned indices were unassigned and not found. It is also robust to sequences of poor quality and poorly designed index list. It handles dual indices too.

ADD COMMENT
0
Entering edit mode

Thanks I'll look into that

ADD REPLY
0
Entering edit mode

let me know if you need help!

ADD REPLY
0
Entering edit mode

Hey Gabriel, Can it handle index sequences that are found in the headers ( such as the format I uploaded above)?

ADD REPLY
0
Entering edit mode

Normally, you should put them as a separate file. However, you can transform your indices in the header into separate files easily using UNIX commands. If you need a hand, paste a few sequences with the headers and I can have a look.

ADD REPLY
0
Entering edit mode

Rezenman : Use the code in my answer here to get a listing of all indexes present and their counts: C: Demultiplexing reads with index present in the labels

ADD REPLY
0
Entering edit mode

Hey Gabriel, I am adding some sequences for you to take a look, just to make sure that I understand correctly: in which format should I add the index files? Thanks a lot!

@A00929:83:HL75TDRXX:1:2101:13919:1047 1:N:0:GTAGAGGA+NATCCTCT
CATATTGATAGTTCGCACAGGTAG
+
FFFFFFFFFFFFFFFFFFFFFFFF
@A00929:83:HL75TDRXX:1:2101:14009:1047 1:N:0:AGGCAGAA+NCTCTCCG
GTGCGTATCTATCAAAAATGTATA
+
FFFFFFFFFFFFFFFFFFFFFFFF
@A00929:83:HL75TDRXX:1:2101:14027:1047 1:N:0:GTAGAGGA+NATCCTCT
AAAACCCTGGCCAACATTGAAGGT
+
FFFFFFFFFFFFFFFFFFFFFFFF
ADD REPLY
0
Entering edit mode

I copied your file as "biostars.fq.gz"

First extract your first indices:

zcat biostars.fq.gz  |awk '{if( (NR%4)==1){ print $0; print substr($2,length($2)-16,8); print "+"; print "FFFFFFFF"} }' |gzip  > index1.fq.gz

Second extract the second:

zcat biostars.fq.gz  |awk '{if( (NR%4)==1){ print $0; print substr($2,length($2)-7,8); print "+"; print "FFFFFFFF"} }' |gzip  > index2.fq.gz

I put 'F' as quality score, so the standard 37 for novaseqs.

Then run deML:

deML -o /tmp/output -i [PATH TO YOUR ]index.txt -f /tmp/biostars.fq.gz -if1 /tmp/index1.fq.gz -if2 /tmp/index2.fq.gz

Let me know if that works for you

ADD REPLY
1
Entering edit mode

Hey, I had a similar problem as the OP and found your comment. I ran your deML on one sample to test it out and it seems to work well. However, I have 8 fastq files (4 forward and 4 reverse) that I need to run this on. Is this program set up such that it can run in parallel or would I need to use a for loop? Thanks for any info you can provide!

ADD REPLY
1
Entering edit mode

You can use a for loop or start processes in parallel if you have the hardware to support this.

ADD REPLY
0
Entering edit mode

GenoMax has a good comment and feel free to email me if you need further help.

ADD REPLY
0
Entering edit mode

Hey all, Thanks for your help it worked good!

ADD REPLY
2
Entering edit mode

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.
Upvote|Bookmark|Accept

ADD REPLY

Login before adding your answer.

Traffic: 805 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6