demultiplexing issue using fastq-mltx
1
0
Entering edit mode
8 weeks ago
y.sevellec ▴ 10

Hi!

I'm having a issue demultiplexing a fastq file containing multiple samples from a nextseq run. we had dual index and dual barcode for each sample. the demultiplexing on the index (I5 and i7) worked properly but I still need to demultplex the barcode_5 and barcode_7. it goes like this:

enter image description here

The "NNNNNN" sequence are the barcode I have to demultiplex to get individual samples.

I have a list of barcode per samples. Here is some examples:

Sample_ID Barcode_5 Sequence 5'-3' Barcode_7 Sequence 5'-3'

A_1 501 TCCGATAT 701 GCTCATTAT

A_2 502 CGGAGATA 701 GCTCATTAT

I saw that nextseq use barcode reverse complement so I tried to dempultiplex using the following:

A_1 TCCGATAT-ATAATGAGC

A_2 CGGAGATA-ATAATGAGC

but all the reads went to the unmatched fastq. I then tried to use the exact sequence I was given.

A_1 TCCGATAT-GCTCATTAT

A_2 CGGAGATA-GCTCATTAT

but it didn't work either...

I can't figure out what I did wrong... Is there another tool that can demultiplex this kind of dual barcoded reads?

fastq demultiplexing dual barcode • 416 views
ADD COMMENT
0
Entering edit mode

Is there another tool that can demultiplex this kind of dual barcoded reads?

First separate the files based on the Illumina indexes using standard Illumina demultiplexing. Then use sabre (LINK) to further de-multiplex your files based on the barcodes.

ADD REPLY
0
Entering edit mode

thank you for you replie, does sabre is able to find two barcode at once? it seems to me you have to use one barcode at a time...

ADD REPLY
0
Entering edit mode

You are probably going to need to do this sequentially via multiple rounds. Perhaps someone else may have a suggestion for a software that can do two barcodes at the same time.

ADD REPLY
1
Entering edit mode
8 weeks ago
Jesse ▴ 450

By two barcodes at once, you mean the two inner barcodes ("NNNNN", barcode_5 and barcode_7), right? Cutadapt actually supports demultiplexing in addition to trimming, and I think it might be able to do this. In particular see the bit about combinatorial dual indexes.

So for example if you have a read pair like this (just using FASTA for a dumb example)

r1.fasta:

>seq1
GGGGGGGACTGACTG
>seq2
GGGGGGGAGAGACGC
>seq3
TTTTTTTCTCTGAGA
>seq4
TTTTTTTCAAACTTA

r2.fasta:

>seq1
AAAAAAACAGTCAGT
>seq2
GGGGGGGGCGTCTCT
>seq3
AAAAAAATCTCAGAG
>seq4
GGGGGGGTAAGTTTG

...and barcodes like this...

barcodes_fwd.fasta:

>fwd1
GGGGGGG
>fwd2
TTTTTTT

barcodes_rev.fasta:

>rev1
AAAAAAA
>rev2
GGGGGGG

You can do this:

$ cutadapt -e 0.15 --no-indels -g ^file:barcodes_fwd.fasta -G ^file:barcodes_rev.fasta \
    -o {name1}-{name2}.1.fasta -p {name1}-{name2}.2.fasta \
    r1.fasta r2.fasta

... and each of those four pairs of sequences will end up in a separate pair of files named like fwd1-rev1.1.fasta etc.

ADD COMMENT
1
Entering edit mode

this is exactely what I did in the end and it worked wonderfully.

Thank you very much for the tips!

ADD REPLY

Login before adding your answer.

Traffic: 2030 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6