demultiplexing "undetermined" ddrad data
0
0
Entering edit mode
3.1 years ago
kulzer • 0

Hello! I could use some help demultiplexing our ddRAD data. We have Hi-Seq 4000 paired end data back from the sequencing facility, but they did not demultiplex for us so we have "Undetermined...fastq.gz" files with an "N" proceeding each barcode in the headers as well as the sequences themselves. I will paste an example below:

@K00337:359:HGJV5BBXY:7:1101:1499:1314 2:N:0:NATCCATG+NATTCATG
NCGCATCATGAACCATTACCGTTCAAAATTCCAGAGAGACTATAATACCTGTGATATGTAGGATTACTGAGATAAATTAATGATCCAATAGCCTGTATGTTTAAACTAGATCTTTGTTAGTATTACATAGAGCTATGGGTTGTAATTTTTC
+
#A<FF<FFJJJAFJJJ<FFJJJJJJFAJJJJJJJJJJAFJAJJJAJFFJJJ<A7JJJJJJJJ<FJ-AFFFJAFJFJJFJAJJ-AAFAFJFJJJF7F-7F<JJF-7-7FFFA<-<AFFAF7F<FAJJJ----<--7A<--A-<JA-F--<--
@K00337:359:HGJV5BBXY:7:1101:1681:1314 2:N:0:NGCCATCT+NACCGAGC
NTGCTGTCATGCTCTGATATCAGGCGGCTGTGGTCACACATCTCCTCTCGCTGTGGCCGAACCAGAAGCAGATATGAATGCAGGCTGCCTAAATTCTTCCTACTGCACTCCTTTCGGAGATTGCTGATCGTATTGTACTGCCCCCAGAACC
+
#A<<F--7FJJFFJFJ-F-7FJJJ-<FFFAA-<J<JJJJJ-F-FJF<JF7JA<<-<J---7FA-<-A7AF<-AJJ---<-<-77AF--7FF--7-----777A-7-7-77<-7-7--7-7-AF7----77--A-------A)-)))))---
@K00337:359:HGJV5BBXY:7:1101:1824:1314 2:N:0:NCCTATCA+NCTACGCC
NACATGTGGCAAGAAAGGAGGAAAAAAAGAGAGGAGGAGGAGCCAGGCTATTTTTAGCAATCAGATCTTATGGAAACTAATACTGAGAAGTCACTCGTTACGATGGCGGGGAGTCTGCAATTAGCTCGCCCCACGCTCGTCCAGGCTTCTG

We are hoping to demultiplex first with only the I7 index (our i5 index is being used to determine PCR duplicates). Then we will demultiplex once more using our inline barcodes.

We are currently losing 100% of our reads to ambiguous barcode drops (which we assume relates to this N insertion), even though we are specifying that we will allow for at least one mismatch.

Does anyone have any ideas on what program we could use to demultiplex this?

And has anyone ever encountered this issue? We are trying to figure out what went wrong so we can prevent this with future libraries.

ddRAD • 708 views
ADD COMMENT
0
Entering edit mode

Are there N's in those first position for all the reads?

Can you try deML (A: demultiplexing tool for dual-indexed paired-end illumina libraries ) that is described in this comment.

ADD REPLY
0
Entering edit mode

Thank you for replying so quickly! Yes, there is an N in that first position for all of our reads. Would you happen to know why this is the case?

We did start working with deML this morning after finding a similar thread to the one you've linked (Demultiplexing based on dual indices in headers while allowing 1 mismatch to each index ), but we are getting a persistent error that we are trying to troubleshoot.

ADD REPLY

Login before adding your answer.

Traffic: 1886 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6