I routinely expect that a 1GB transfer of data across the internet to be done without a single bit flip.
As a bioinformatic newbie, I have no reference point for how repeatable genomic sequencing currently is. This generates a series of questions:
1) If I have my genome fully sequenced at two different, state of the art sequencing centers (e.g. the best in america, the best in europe) how similar would the results be (and how should we define and measure similarity)? (I'm interested in reference genome quality sequences, not e.g. clinical quality)
2) For that matter, if I use the same center, but just on two different occasions separated by a a couple of weeks, what should I expect?
3) Instead of human genome, suppose a strain of E. coli was being analyzed -- how does that effect the repeatability?
4) Naively I would assume that I would receive back data organized in 46 contiguous chunks. Pragmatically, what should I expect? -- e.g. you'll get haploid data, and the avg # of contigs per chromosome will be x.
5) Genomic Coordinate Systems: For me, similarity between two sequencing runs means first, that there is computational simple way to translate the coordinates of a region of interest in one data set to the other, and second, the sequence data matches. For instance, how does this work for humane genome GRCh37 vs GRCh38 -- how would I go about translating coordinates for e.g. all the opsins in GRCh37 to their coordinates in GRCh38, and how likely would it be that there would be a single nucleotide mismatch for this limited portion of dna?
Hope those questions make sense. Thanks! nehal