Question

demultiplexing issue using fastq-mltx

0

Entering edit mode

16 months ago

y.sevellec ▴ 10

Hi!

I'm having a issue demultiplexing a fastq file containing multiple samples from a nextseq run. we had dual index and dual barcode for each sample. the demultiplexing on the index (I5 and i7) worked properly but I still need to demultplex the barcode_5 and barcode_7. it goes like this:

enter image description here

The "NNNNNN" sequence are the barcode I have to demultiplex to get individual samples.

I have a list of barcode per samples. Here is some examples:

Sample_ID Barcode_5 Sequence 5'-3' Barcode_7 Sequence 5'-3'

A_1 501 TCCGATAT 701 GCTCATTAT

A_2 502 CGGAGATA 701 GCTCATTAT

I saw that nextseq use barcode reverse complement so I tried to dempultiplex using the following:

A_1 TCCGATAT-ATAATGAGC

A_2 CGGAGATA-ATAATGAGC

but all the reads went to the unmatched fastq. I then tried to use the exact sequence I was given.

A_1 TCCGATAT-GCTCATTAT

A_2 CGGAGATA-GCTCATTAT

but it didn't work either...

I can't figure out what I did wrong... Is there another tool that can demultiplex this kind of dual barcoded reads?

fastq demultiplexing dual barcode • 948 views

ADD COMMENT • link updated 16 months ago by GenoMax 141k • written 16 months ago by y.sevellec ▴ 10

0

Entering edit mode

Is there another tool that can demultiplex this kind of dual barcoded reads?

First separate the files based on the Illumina indexes using standard Illumina demultiplexing. Then use sabre (LINK) to further de-multiplex your files based on the barcodes.

ADD REPLY • link 16 months ago by GenoMax 141k

0

Entering edit mode

thank you for you replie, does sabre is able to find two barcode at once? it seems to me you have to use one barcode at a time...

ADD REPLY • link 16 months ago by y.sevellec ▴ 10

0

Entering edit mode

You are probably going to need to do this sequentially via multiple rounds. Perhaps someone else may have a suggestion for a software that can do two barcodes at the same time.

ADD REPLY • link 16 months ago by GenoMax 141k

score 1 · Answer 1 · 2022-12-01

By two barcodes at once, you mean the two inner barcodes ("NNNNN", barcode_5 and barcode_7), right? Cutadapt actually supports demultiplexing in addition to trimming, and I think it might be able to do this. In particular see the bit about combinatorial dual indexes.

So for example if you have a read pair like this (just using FASTA for a dumb example)

r1.fasta:

>seq1
GGGGGGGACTGACTG
>seq2
GGGGGGGAGAGACGC
>seq3
TTTTTTTCTCTGAGA
>seq4
TTTTTTTCAAACTTA

r2.fasta:

>seq1
AAAAAAACAGTCAGT
>seq2
GGGGGGGGCGTCTCT
>seq3
AAAAAAATCTCAGAG
>seq4
GGGGGGGTAAGTTTG

...and barcodes like this...

barcodes_fwd.fasta:

>fwd1
GGGGGGG
>fwd2
TTTTTTT

barcodes_rev.fasta:

>rev1
AAAAAAA
>rev2
GGGGGGG

You can do this:

$ cutadapt -e 0.15 --no-indels -g ^file:barcodes_fwd.fasta -G ^file:barcodes_rev.fasta \
    -o {name1}-{name2}.1.fasta -p {name1}-{name2}.2.fasta \
    r1.fasta r2.fasta

... and each of those four pairs of sequences will end up in a separate pair of files named like fwd1-rev1.1.fasta etc.