I am trying to understand linked read generation by Illumina 10X chromium.
1.25 ng gDNA is loaded into the machine, which has a capture efficiency of 40% (0.5ng). 0.5ng = ~150 copies of genome. This is partitioned into ~1 Million Gel bead-emulsions (GEMS).
So for a 3Gb genome,
It wil be (150 * 3 * 10^9)/10^6 = 450 kb in one GEM.
As per illumina technical noteThe average input gDNA molecule length is 50 kb. Within each GEM, there are, on average, 10 gDNA molecules (I assume 10 fragments of length >50kb). i.e. ~1/7000 of genome/GEM.
My doubt is, Here, 10 different fragments (of length 50kb) attached to one GEM are given same barcode. How can we be sure that 10 different fragments attaching to one GEM are from nearby loci so that they can be linked later using the same barcode assigned to them?
Is that the barcode is used to link only short-reads that will be generated from one 50 kb fragment? 500 kb that is alloted to one GEM may be different areas of the genome and those 10 regions will have same barcode?
Did I put my question clear?.. Thank you for clarifications in advance.