demultiplexing issue using fastq-mltx
1
0
Entering edit mode
8 weeks ago
y.sevellec ▴ 10

Hi!

I'm having a issue demultiplexing a fastq file containing multiple samples from a nextseq run. we had dual index and dual barcode for each sample. the demultiplexing on the index (I5 and i7) worked properly but I still need to demultplex the barcode_5 and barcode_7. it goes like this:

The "NNNNNN" sequence are the barcode I have to demultiplex to get individual samples.

I have a list of barcode per samples. Here is some examples:

Sample_ID Barcode_5 Sequence 5'-3' Barcode_7 Sequence 5'-3'

A_1 501 TCCGATAT 701 GCTCATTAT

A_2 502 CGGAGATA 701 GCTCATTAT

I saw that nextseq use barcode reverse complement so I tried to dempultiplex using the following:

A_1 TCCGATAT-ATAATGAGC

A_2 CGGAGATA-ATAATGAGC

but all the reads went to the unmatched fastq. I then tried to use the exact sequence I was given.

A_1 TCCGATAT-GCTCATTAT

A_2 CGGAGATA-GCTCATTAT

but it didn't work either...

I can't figure out what I did wrong... Is there another tool that can demultiplex this kind of dual barcoded reads?

fastq demultiplexing dual barcode • 416 views
0
Entering edit mode

Is there another tool that can demultiplex this kind of dual barcoded reads?

First separate the files based on the Illumina indexes using standard Illumina demultiplexing. Then use sabre (LINK) to further de-multiplex your files based on the barcodes.

0
Entering edit mode

thank you for you replie, does sabre is able to find two barcode at once? it seems to me you have to use one barcode at a time...

0
Entering edit mode

You are probably going to need to do this sequentially via multiple rounds. Perhaps someone else may have a suggestion for a software that can do two barcodes at the same time.

1
Entering edit mode
8 weeks ago
Jesse ▴ 450

By two barcodes at once, you mean the two inner barcodes ("NNNNN", barcode_5 and barcode_7), right? Cutadapt actually supports demultiplexing in addition to trimming, and I think it might be able to do this. In particular see the bit about combinatorial dual indexes.

So for example if you have a read pair like this (just using FASTA for a dumb example)

r1.fasta:

>seq1
GGGGGGGACTGACTG
>seq2
GGGGGGGAGAGACGC
>seq3
TTTTTTTCTCTGAGA
>seq4
TTTTTTTCAAACTTA


r2.fasta:

>seq1
AAAAAAACAGTCAGT
>seq2
GGGGGGGGCGTCTCT
>seq3
AAAAAAATCTCAGAG
>seq4
GGGGGGGTAAGTTTG


...and barcodes like this...

barcodes_fwd.fasta:

>fwd1
GGGGGGG
>fwd2
TTTTTTT


barcodes_rev.fasta:

>rev1
AAAAAAA
>rev2
GGGGGGG


You can do this:

\$ cutadapt -e 0.15 --no-indels -g ^file:barcodes_fwd.fasta -G ^file:barcodes_rev.fasta \
-o {name1}-{name2}.1.fasta -p {name1}-{name2}.2.fasta \
r1.fasta r2.fasta


... and each of those four pairs of sequences will end up in a separate pair of files named like fwd1-rev1.1.fasta etc.

1
Entering edit mode

this is exactely what I did in the end and it worked wonderfully.

Thank you very much for the tips!

Traffic: 2030 users visited in the last hour
FAQ
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.