I'm building a system to generate fastq files as they are being output from HiSeq 3000 and HiSeq 4000. This will serve us to test our internal systems with some gold standards.
As we are working with microbiome a standard sample may contain around 1500 different species, is there a a bias in the output regarding how 'close' each assembly record appear next to each other or is it truly random?
If record 1,1 is from assembly 2913, what are the odds that record 1,2 is of that assembly as well (assuming 1500 species with the same strain length) ?
I couldn't find any paper on that so any help from experience would be great.