bar code manipulation
1
0
Entering edit mode
7.6 years ago
rob.costa1234 ▴ 310

I am working on illumina sequencing data. Since Illumina tool throws away any data with one mismatch of the bar code. So I want to to first extract all the data with all barcodes and then use BCL2 to FASTq . Is there any software or script which can allow me to extract all the data irrespective of bar code mismatches.

Thanks

bar code • 1.2k views
ADD COMMENT
0
Entering edit mode

Illumina software (at least off-line bcl2fastq v.1 and v.2) does NOT throw away data that has one mismatch, in fact that is the default for the software (for v.2.x). You can allow for more mismatches (as long as your barcodes support it).

ADD REPLY
0
Entering edit mode
7.6 years ago
Amitm ★ 2.2k

hi, You get all the data anyways. As per the sample sheet provided to the BCL2Fastq, all reads that have no mismatch in the barcode (or 1 mismatch depending on setting) are made into *.fastq.gz files where the * is the sample name as per the SampleSheet. Reads that have 1 or more mismatch are present as Undetermined_S0_L001_R1_001.fastq.gz (or R2 for all the lanes). In older versions of BCL2Fastq, there used to be a separate folder called Undetermined_indices/ under the Unaligned/ folder (i.e. the one obtained after successfully running BCL2Fastq). In newer ver. of BCL2Fastq, all the fastq files (sample assigned or not) are present inside the Unaligned/ folder.

So, you have all the data, just the mismatch ones unassigned. Also, if you look into the fastq headers of the unassigned, the barcodes are given (as in normal files). So, you may try binning them to sample classes if thats what you are after.

ADD COMMENT

Login before adding your answer.

Traffic: 2744 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6