We have a pipeline that we have developed that currently works on both 454 and Ion Torrent data.
The pipeline is always run on multiplexed data and the sequence input is currently a fastq file that contains all of the information to undertake the demultiplexing of the data and all subsequent analysis is run on all the data from each MID separately.
Collaborators have now generated similar data using an Illumina MiSeq however when they sent us the data we see that the data is already demultiplexed with tags etc stripped.
What I want to know is there anyway that a single fastq/sff/etc file can be created during a MiSeq run from the output data (or during data generation) that contains the MIDs etc still on the data and has all the data together in one file?
I've done extensive reading on this and it seems that the best way to do this is to convert the multiple .bcl files to fastq?
Is there a better/easier way to do this?
I'm not completely sure if you know this already, but Illumina have indexes that are not sequenced. This is completely different to 454/Ion torrent where you can identify them at the start of the sequence. You can convert bcl files to fastq to keep the index but you will get an R1 (forward), R2 (index) and R3 (reverse) read files. Analysis programs like Qiime use these files to demultiplex the data (
split_libraries_fastq.py). To do this "non-demultiplexing" step with the BCL conversion you'll need the entire run directory to do so (usually around 30-60Gb). However I don't think this is going to help you much.
As a suggestion, how about you just add the specific MID sequence to the start of each read in the demultiplexed sample fastq's? It would be relatively simple to setup in a script and would be able to input into your current pipeline