Question: 23Andme Coverage
13
gravatar for Aleksandr Levchuk
8.3 years ago by
United States
Aleksandr Levchuk3.1k wrote:

I would like to use 23andMe to get some raw data out of my genome.

23andMe sends the samples to LabCorp which uses Illumina OmniExpress Plus Genotyping BeadChip.

I don't know much about this technology and am much more familiar with the Illumina GA II sequencer and recently completed training for the HiSeq 2000. I assume that the BeadChip genotype information can be converted into a partially incomplete genomic sequence. Indels and transposons would not be detectable but SNPs would show up.

What percentage of the human genome can this technology cover?

For me the preferable definition of "covered" is:

The number of letters determined (above some accuracy threshold) divided by 2.9 billion.

Sub-questions:

  • Can the technology multiplex multiple individuals into one run?
  • How much do supplies cost to do one run?
illumina snp human genotyping • 8.9k views
ADD COMMENTlink modified 8.3 years ago • written 8.3 years ago by Aleksandr Levchuk3.1k
5

Did anybody even mention to quetion the hilarious promises this company makes on their front page? "from baldness to muscle performance" ... "Know your predicted response to drugs, from blood thinners to coffee." Did anyone care to ask a geneticist about the actual probabilities behind GWAS studies? I almost lol'ed about their shallow promises....

ADD REPLYlink written 8.3 years ago by Michael Dondrup45k

+1 to Michael Dondrup. Yes, I think it's a lot of hype and the test a very unreliable. As far as the hype (popularity), I estimate that the company processes about 5,000 genomes per week.

ADD REPLYlink written 8.3 years ago by Aleksandr Levchuk3.1k
11
gravatar for David Quigley
8.3 years ago by
David Quigley11k
San Francisco
David Quigley11k wrote:

[Edited to reflect clarification from Larry]

OmniExpress is a 933,202 marker SNP chip, using hybridization to call polymorphisms at defined loci. For a product summary (including some measures of genomic coverage, which are ultimately dependent on linkage and the population in question), see product info on Illumina's web site. I think you can run 12 samples per chip, but each sample runs in a separate part of the chip, so I don't think you'd call it multiplexed the way you would if you were mixing samples in a single lane of sequencing.

SNP-based systems give a different kind of information from sequencing, in that they only query defined polymorphisms and are more useful for genetic studies that look for signals linked to a region of the genome than for sequencing-style searches for de novo mutations.

ADD COMMENTlink modified 8.3 years ago • written 8.3 years ago by David Quigley11k
6
gravatar for Larry_Parnell
8.3 years ago by
Larry_Parnell16k
Boston, MA USA
Larry_Parnell16k wrote:

Over at GenomesUnzipped, Don Conrad, who is well known for analysis of copy number variants (CNVs), offered a guest post on analysis of his 23andMe data for CNVs. I give this information here because one determination of cost is "How much get I get from the data?"

In terms of cost for supplies and such - that really depends on how many platforms/arrays/chips you will purchase. I am sure that 23andMe gets a different pricing than you or I would. The chip that 23andMe is using is 733 K SNPs, as David writes, plus an additional 200,000 specially selected SNPs culled from literature, GWAS data and such. This chip is not likely to be available to any entity except 23andMe.

In the end, the best way to determine cost is to contact the Illumina sales rep who covers your area. If you're serious about getting into their system, then you will need to do business with this individual anyway. So, why not start now and get some real (as opposed to guestimate) numbers.

OK, as of 16:30 on 30 Nov in Boston, it seems that "coverage of the genome" has crept into the conversation. To me this implies imputation. Imputation is not precise but offers a range of confidence values given the ancestry of the individual who has been genotyped and the nature of the available data that will be used to calculate the imputed genotypes. I have no first-hand experience with genotype imputation nor do my colleagues. So, I will leave this to others on this forum. Suffice to say, with the number of SNPs that will be tested in this example, there is much that can be imputed for a person of European ancestry.

ADD COMMENTlink modified 8.3 years ago • written 8.3 years ago by Larry_Parnell16k

+1 Thanks for the info. It took a while to find but here is the link that you talked about: http://www.genomesunzipped.org/2010/08/dude-where-are-my-copy-number-variants.php

ADD REPLYlink written 8.3 years ago by Aleksandr Levchuk3.1k

Thanks for the clarification on 733k + 200k vs just 733k; that's potentially a big difference.

ADD REPLYlink written 8.3 years ago by David Quigley11k
3
gravatar for Giovanni M Dall'Olio
8.3 years ago by
London, UK
Giovanni M Dall'Olio26k wrote:

It is explained on their website. They genotype ~600,000 snps (source) and for the coverage, you can look here: https://www.23andme.com/ancestry/techniques/

ADD COMMENTlink written 8.3 years ago by Giovanni M Dall'Olio26k
1

+1 Thanks! I did not find this page before. It could have enough info but there is not direct answer of what par of the genome is covered.

ADD REPLYlink written 8.3 years ago by Aleksandr Levchuk3.1k
2
gravatar for Michael Dondrup
8.3 years ago by
Bergen, Norway
Michael Dondrup45k wrote:

This should be part of the package insert of this "diagnostic method":

Patients inquiring about genomewide association testing should be advised that at present the results of such testing have no value in predicting risk and are not clinically directive. Clinicians would do well to use the discussion as an opportunity to point out other identifiable, modifiable risk factors that motivated patients can control.12,73 Whether to heed such advice or instead undergo testing and present the physician with the test results as a fait accompli is the choice of the individual patient. A decision to undergo genomewide association testing may result in the diversion of scarce time and resources to counseling or follow-up investigation of findings.

cite: Manolio TA, NEJM (2010)

[/off-topic]

ADD COMMENTlink written 8.3 years ago by Michael Dondrup45k
1
gravatar for Cathy Chi
8.3 years ago by
Cathy Chi10
Cathy Chi10 wrote:

SNP is Single Nucleotide Polymorphisms. OmniExpress can offer you around 900K SNP result ,like this: SNP1 AG SNP2 TT SNP3 AA .... What percentage of the human genome can this technology cover? There are 3 billion base pair in human gonome, OmniExpress can give you 900,000 base pair information, it doesn't like sequence .

Can the technology multiplex multiple individuals into one run? Yes, Illumina put 12 samples in one chip, but in fact different samples are arrayed in different area.you can not offer mixed DNA to 23 and me.

How much do supplies cost to do one run? Today , 23 and me offer a discount ,you can buy a DNA Test at 99$, after that , you can get your report and raw data.The report tell you the risk of some disease, personal characters etc. of course, you can analysis your raw data by other method.

ADD COMMENTlink written 8.3 years ago by Cathy Chi10

So you're saying that somewhere .02% to .03% of the genome is covered. But when doesn't the technology detect some letters around the SNP too?

ADD REPLYlink written 8.3 years ago by Aleksandr Levchuk3.1k

So from you're saying (900,000 letters out of 3 billion) then somewhere .02% to .03% of the genome is covered. But when doesn't the technology detect some letters around the SNP too?

ADD REPLYlink written 8.3 years ago by Aleksandr Levchuk3.1k

So from you're saying (900,000 letters out of 3 billion) it sounds like somewhere between .02% to .03% of the genome is covered.

ADD REPLYlink written 8.3 years ago by Aleksandr Levchuk3.1k

Doesn't the technology detect some letters neighboring around the SNP?

ADD REPLYlink written 8.3 years ago by Aleksandr Levchuk3.1k

I posted an answer that describes this in more detail.

ADD REPLYlink written 8.3 years ago by Aleksandr Levchuk3.1k
1
gravatar for Cathy Chi
8.3 years ago by
Cathy Chi10
Cathy Chi10 wrote:

SNP is a kind of genetic variation related with disease and drug effect, so researchers just focus on SNP ,simply to say , just focus on some base pairs. Other sequence? we don't care. And Illumina's SBE technique can only offer one base information. If you want to know more sequence, you should sequence your genome by Hiseq2000, but it's much more expensive . Illumina SNP array offer you the most important informationwith the lowest cost .(we can say SNP is important variation by current research, but in the future,I'm not sure if we can still say so)

ADD COMMENTlink written 8.3 years ago by Cathy Chi10
1
gravatar for Aleksandr Levchuk
8.3 years ago by
United States
Aleksandr Levchuk3.1k wrote:

Up to now, nobody was able to answer my question precisely. Thank you for the answer submissions! With your helpful hints, I was able to collect enough information to try to reason about the question in the following way.

https://www.23andme.com/more/genotyping/ says:

The technology that we use, the Illumina OmniExpress Plus, analyzes approximately 1,000,000 SNPs that cover the entire genome.

There are 3 Billion letters in a human genome.

Illumina's OmniExpress does the Infinium DNA Analysis Assay (1)

The assay uses 50-mers to detect SNPs followed + single-letter fluorescent extension (2)

So the each of the approximately 1,000,000 SNPs requires exactly 51 letters to be implicitly determined (50 identical to the probe and 1 for the SNP).

That's approximately 51 Million letters total.

Edit: This is not correct because not all letters in the 50 base region must be identical to the probe. Only the SNP letter is reliably determined. Please see the comment by David Quigley.

Conclusion:

  • Human Genome covered by 23andMe will be no more than 1.6% (3)
  • More reasonably only the SNPs should be counted. Then the coverage is .03% (4)
ADD COMMENTlink modified 8.3 years ago • written 8.3 years ago by Aleksandr Levchuk3.1k
2

The idea that you have the sequence for 51 nucleotides per probe is not correct. The only nucleotide in the probe that is actually assayed is the one that is intended to be polymorphic. Other loci in the 50-mer could be polymorphic, which would affect the binding properties of the probe, but you'd have no way to know what is going on.

ADD REPLYlink written 8.3 years ago by David Quigley11k
2

Where the mismatch occurs drives the ability of the probe to still bind. A mismatch near the polymorphic site may bind much less efficiently than one in the probe's middle. Mismatches opposite a T or C (physically smaller) will be tolerated differently than those opposite A or G. A mismatch surrounded by higher G+C content (with 3 H-bonds/pair) is different than one in an A+T rich context. No easy answer to all this.

ADD REPLYlink written 8.3 years ago by Larry_Parnell16k

Something to keep in mind is the difference between sequencing and genotyping. 23andMe and similar companies offer genotyping and so the data reflect measures of which alleles (presumably at positions where variants exist) are present. These are not sequence data and so coverage of base pairs is a bit misguided. Your 1.6% looks fine, but it's like measuring surface area of the apples in your grocery cart - correct but not the most useful.

ADD REPLYlink written 8.3 years ago by Larry_Parnell16k

To David Quigley: This makes sense. Thanks! I'm making a note in the answer to reflect my error. Do you know many mismatches in the 50-mer are permissible for the hybridization to happen?

ADD REPLYlink written 8.3 years ago by Aleksandr Levchuk3.1k

Larry Parnell, Thank you.

ADD REPLYlink written 8.3 years ago by Aleksandr Levchuk3.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1868 users visited in the last hour