I expand on my previous answer on request and also push illumina2bam.
Illumina allows combining multiple libraries into one lane using multiplexing. Illumina multiplexes with an additional read that reads a short sequence within in the adapter after it has read the first read. This results in a sequence of read1:index-read for single end reads and read1:index-read:read2 for paired end reads. So paired end read2 -> /3. Seems like in your case an indexed read was specified - maybe it was necessary for other lanes, or the wrong program was chosen.
It is superior to simply adding a short barcode at the beginning of the product because you have less problems with basecalling (normal complexity at start of reads).
I suggest everyone involved in collecting data from the machine to have a look at bam as primary output format instead of fastq and maybe push for it:
- you have less problems with the scale of the quality values. This was changed 4 times now.
- more important: all the provenance information is saved within the file, and if you have a
correctly working pipeline set up - I am far from that :-( - all programs save the transformations on the data in the file. You know exactly what happened (which parameters, which version etc...).
2 possibilities exist to my knowledge:
* [illumina2bam] which reads directly from the saved bcl files and its **easy** to use!
* [IlluminaBasecallsToSam] picards which I think starts from the qseq files.
In the case of illumina2bam there is a great pipeline that takes the basecalls and puts the index read into the tags of the read in the bam file. Easy to parse, easy to split, merge etc.
modified 8.2 years ago
8.2 years ago by
Ido Tamir ♦ 5.0k