How to "discover" read structure and barcodes given Illumina sequencing run directory
2
0
Entering edit mode
3.4 years ago
William ★ 5.3k

Is there any tool to discover the read structure and barcodes used in a random Illumina sequencing run directory?

I have some Illumina sequencing run directories where I don't have all the information needed to demultiplex the run.

With Illumina sequencing run I mean a directory like Data/Intensities/BaseCalls/

Instead of trying a lot of different read structures and barcodes I thought it should be possible to discover the used read structure and barcodes. This based on sequence over representation.

bcl illumina barcodes demultiplexing • 2.1k views
ADD COMMENT
3
Entering edit mode
3.3 years ago
GenoMax 141k

I thought it should be possible to discover the used read structure and barcodes.

You can take a look at RunInfo.xml file in the top level directory which shows you the read structure.

                 <Reads>
                        <Read Number="1" NumCycles="150" IsIndexedRead="N"/>
                        <Read Number="2" NumCycles="8" IsIndexedRead="Y"/>
                        <Read Number="3" NumCycles="8" IsIndexedRead="Y"/>
                        <Read Number="4" NumCycles="150" IsIndexedRead="N"/>
                </Reads>

As for the barcodes you can simply call bases using a blank samplesheet to generate "Undetermined" read files with bcl2fastq. Then use the code here to identify indexes present in your data: C: Demultiplexing reads with index present in the labels

After that you can either go back and make up a samplesheet for a re-run with bcl2fastq or use demuxbyname.sh from BBMap suite to demultiplex your data. All this works as long as you have information about Sample_ID <--> barcode information.

ADD COMMENT
1
Entering edit mode
3.3 years ago
William ★ 5.3k

Just convert the entire sequences to fastq without de-multiplexing and without trimming.

Note that you need to set the entire read length to template.

148T in this example.

picard  IlluminaBasecallsToFastq B=./{MY_RUN}/Data/Intensities/BaseCalls/ L=1 RS=148T INCLUDE_NON_PF_READS=false COMPRESS_OUTPUTS=true RUN_BARCODE=MY_RUN OUTPUT_PREFIX=MY_RUN READ_NAME_FORMAT=ILLUMINA  NUM_PROCESSORS=1 IGNORE_UNEXPECTED_BARCODES=false FORCE_GC=false

Then grep on the expected barcodes to see where they are located.

ADD COMMENT

Login before adding your answer.

Traffic: 2806 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6