Question

Does Illumina Undetermined Fastq files contain barcodes

4

Entering edit mode

8.4 years ago

Paul ★ 1.5k

Dear all,

I have some questions regarding demultiplexing Illumina BCL files. First question is if Illumina´s Undetermined Fastq files contains barcodes?

Second question - If I would like to provide demultiplexing and keep my barcodes in fastq files (no in the header like CASAVA provide, but inside the read sequence) - does anybody have any experience how to do that?

Thak you for any sharing your experience.

illumina demultiplexing barcodes • 9.5k views

ADD COMMENT • link updated 6.5 years ago by ptinto ▴ 200 • written 8.4 years ago by Paul ★ 1.5k

1

Entering edit mode

8.4 years ago

Ido Tamir 5.2k

yes (but simply try it out)
FASTQ is a shitty format and there is not way to identifiably represent the index read in the read sequence (beside the name field). One possibility would be to use a third or 4. file: --create-fastq-for-index-reads. Another would be to switch to BAM files, where you can add the index reads in tags.

ADD COMMENT • link updated 4.4 years ago by Ram 43k • written 8.4 years ago by Ido Tamir 5.2k

Ram · Accepted Answer · 2015-11-27

6

Entering edit mode

8.4 years ago

Devon Ryan 104k

Yes, the undetermined files have the barcodes in the header of each read.

I haven't a clue how you would include read #2 into read #1, though perhaps playing with --use-bases-mask would do the trick (presumably by adding the index length to that of read1). It would make more sense to me to just write read 2 as separate file (so paired end experiments will have all three reads rather than just two).

ADD COMMENT • link 8.4 years ago by Devon Ryan 104k

0

Entering edit mode

Thank you Devon for answer.. I have Undetermined Fastq files from MiSeq (MSR does not print barcode in the header like for example bcl2fastq).

this is my first read of undetermined FASTQ file:

@M03456:4:000000000-AGW03:1:1101:15913:1332 1:N:0:0
ATACAGACATATCTGTACGTGAACAATGCTGGTCCCTGTGGTTTGCCCACCTGCATCTATGCATTTGTTCACGTGGATGTACCCAAGTCCCTGGTCAGACCAGGAGCACCTCAGTGGCTCCTGGGAAAGAGGAAAGATCGGAAGAGCACAC
+
3>AABFFF4@D@GGFBGG4BF4FGBFBAEGHBC5AFFGFFFGCG?HH3FGECHAGHBGHHDGFDFFFFFGHGGGHAGHGHHEHFDEEGFHHHGHGGFGHHHGHEEEEAHGHHHGGFHGHFAGFCE0CGFFGE0B3B33FFADF?0BG1CGH

I am not sure how MiSeq Reporter works when creates Undetermined fastq files.

Thank you.

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.4 years ago by Paul ★ 1.5k

0

Entering edit mode

Ah, I never have to handle our MiSeq data, sorry.

ADD REPLY • link 8.4 years ago by Devon Ryan 104k

0

Entering edit mode

why don't you use bcl2fastq?

ADD REPLY • link 8.4 years ago by Ido Tamir 5.2k

0

Entering edit mode

were you able to fix it? I have the same problem...

ADD REPLY • link 3.7 years ago by juan_lu_97 • 0

score 2 · Accepted Answer · 2017-10-15

MiSeq MSR does not put indexes in header but the index_rank (row). If the index used is not found in the sample sheet, it put a 0 (the last :0 in the header of the undetermined_reads.

With bcl2fastq you can ask to have the indexes written as a FASTQ with option `--create-fastq-for-index-reads

You can set MiseqReported to produce the fastq of the indexes modifying the MiSeqReporter.config with:

 <appSettings>
            <add key="CreateFastqForIndexReads" value="1" />
 </appSettings>

If you want to explore the indexes in the undetermined to see if you have mispelled an index in the sample sheet and you are loosing this sample in the demultiplexing, there is a file in the flowcell folder where are recorded all the index found an how many tiems they where found (you can acces this data also from the MSR) but I can not remember the path right now (this file is different in MiSeq and in NextSeq500).