Question

demultiplexing "undetermined" ddrad data

0

Entering edit mode

3.1 years ago

kulzer • 0

Hello! I could use some help demultiplexing our ddRAD data. We have Hi-Seq 4000 paired end data back from the sequencing facility, but they did not demultiplex for us so we have "Undetermined...fastq.gz" files with an "N" proceeding each barcode in the headers as well as the sequences themselves. I will paste an example below:

@K00337:359:HGJV5BBXY:7:1101:1499:1314 2:N:0:NATCCATG+NATTCATG
NCGCATCATGAACCATTACCGTTCAAAATTCCAGAGAGACTATAATACCTGTGATATGTAGGATTACTGAGATAAATTAATGATCCAATAGCCTGTATGTTTAAACTAGATCTTTGTTAGTATTACATAGAGCTATGGGTTGTAATTTTTC
+
#A<FF<FFJJJAFJJJ<FFJJJJJJFAJJJJJJJJJJAFJAJJJAJFFJJJ<A7JJJJJJJJ<FJ-AFFFJAFJFJJFJAJJ-AAFAFJFJJJF7F-7F<JJF-7-7FFFA<-<AFFAF7F<FAJJJ----<--7A<--A-<JA-F--<--
@K00337:359:HGJV5BBXY:7:1101:1681:1314 2:N:0:NGCCATCT+NACCGAGC
NTGCTGTCATGCTCTGATATCAGGCGGCTGTGGTCACACATCTCCTCTCGCTGTGGCCGAACCAGAAGCAGATATGAATGCAGGCTGCCTAAATTCTTCCTACTGCACTCCTTTCGGAGATTGCTGATCGTATTGTACTGCCCCCAGAACC
+
#A<<F--7FJJFFJFJ-F-7FJJJ-<FFFAA-<J<JJJJJ-F-FJF<JF7JA<<-<J---7FA-<-A7AF<-AJJ---<-<-77AF--7FF--7-----777A-7-7-77<-7-7--7-7-AF7----77--A-------A)-)))))---
@K00337:359:HGJV5BBXY:7:1101:1824:1314 2:N:0:NCCTATCA+NCTACGCC
NACATGTGGCAAGAAAGGAGGAAAAAAAGAGAGGAGGAGGAGCCAGGCTATTTTTAGCAATCAGATCTTATGGAAACTAATACTGAGAAGTCACTCGTTACGATGGCGGGGAGTCTGCAATTAGCTCGCCCCACGCTCGTCCAGGCTTCTG

We are hoping to demultiplex first with only the I7 index (our i5 index is being used to determine PCR duplicates). Then we will demultiplex once more using our inline barcodes.

We are currently losing 100% of our reads to ambiguous barcode drops (which we assume relates to this N insertion), even though we are specifying that we will allow for at least one mismatch.

Does anyone have any ideas on what program we could use to demultiplex this?

And has anyone ever encountered this issue? We are trying to figure out what went wrong so we can prevent this with future libraries.

ddRAD • 708 views

ADD COMMENT • link 3.1 years ago by kulzer • 0

0

Entering edit mode

Are there N's in those first position for all the reads?

Can you try deML (A: demultiplexing tool for dual-indexed paired-end illumina libraries ) that is described in this comment.

ADD REPLY • link 3.1 years ago by GenoMax 141k

0

Entering edit mode

Thank you for replying so quickly! Yes, there is an N in that first position for all of our reads. Would you happen to know why this is the case?

We did start working with deML this morning after finding a similar thread to the one you've linked (Demultiplexing based on dual indices in headers while allowing 1 mismatch to each index ), but we are getting a persistent error that we are trying to troubleshoot.

ADD REPLY • link updated 3.1 years ago by GenoMax 141k • written 3.1 years ago by kulzer • 0