Question

Interesting things to do with 23andme data

5

Entering edit mode

4.4 years ago

LizzAlice ▴ 60

Hi,

I am a bioinformatics student, but unfortunately, I gained no practical experience so far and my lectures were not very informative. So I decided I would start doing something fun in my free time to get some experience. I recently used 23andme, so now I have a file with SNPs. I would really appreciate impulses about which questions could be answered with this, as well as recommendations for tools. I was thinking about programming some kind of pipeline with python. Thanks!

SNP 23andme • 2.9k views

ADD COMMENT • link updated 4.2 years ago by benformatics 3.9k • written 4.4 years ago by LizzAlice ▴ 60

score 3 · Answer 1 · 2019-12-03

If the SNPs are reported with dbSNP or other IDs, use tools and APIs to find the chromosome, position, reference allele, and alternative allele. Try searching for any clinical annotations through Clinvar or other resources. Join with dbNSFP or other functional databases to attempt to predict the biological consequence of any mutations for splicing or coding regions. See what the frequencies of your mutations are in larger cohorts, like gnomAD, 1000 Genomes, or HapMap. Try converting your coordinates from one reference build (e.g., GRCh37) to another (e.g., GRCh38). Do other organisms have similar mutations in orthologous genes? Explore this and other evolutionary questions using the UCSC genome browser.

I think this is a great idea for someone diving into bioinformatics a bit! There are lots of resources, including the Biostars Handbook, that can help you if you get stuck.

score 3 · Answer 2 · 2019-12-03

I think working with your raw data is usually a good idea, but I think getting the most out of your data may require a non-trivial time commitment.

That said, there are some things that don't require coding experience. For example, here are some links from this blog post:

If you are OK with making your data publicly available, you can also generate a GET-Evidence report from the Personal Genome Project (and you can see my data as an example here).

I am currently looking into impute.me. My preliminary guess from MySeq and the 23andMe diabetes report is that the PRS percentiles may be helpful for critically assessing your data, but may not actually be the most helpful (although I am sure there must be some exceptions). I think they also provide some other things - however, even with the $5 donation, I don't have results for what I submitted yesterday.

I also think DNA.land is making some changes, but I believe most of the other links that I have provided have free options.

In general, I would recommend against options that I have seen to re-analyze your data for a charge:

For example, you can see some concerns that I list for GenoPalate on GitHub
I also have some general warnings in this collection of blog posts
While they aren't all about 23andMe re-analysis, I think @kristenvbrown has some good coverage of genomics companies to be cautious about (such as this story for Orig3n)

There might also be exceptions that I don't know about. However, learning about your data in greater detail with free options (learning more coding and biology) is what I think is really the best (all other things being equal).

score 2 · Answer 3 · 2020-02-11

2

Entering edit mode

4.2 years ago

benformatics 3.9k

You can convert your results into a VCF and then plug them into Ensembl VEP but then take a look at the nonsense/missense mutations that affect protein coding genes. However, I would take the results with a grain of salt.

ADD COMMENT • link 4.2 years ago by benformatics 3.9k

1

Entering edit mode

Another variant annotation alternative is OpenCRAVAT, which can run directly on the 23andMe files.

ADD REPLY • link 4.2 years ago by Collin ▴ 1000

score 1 · Answer 4 · 2019-12-04

1

Entering edit mode

4.4 years ago

WouterDeCoster 47k

Things that I did with my 23andme data:

Look at my APOE allele (major risk locus for Alzheimer Disease)
Look at heterozygous recessive alleles (carrier status, could be important if you are thinking about having kids)

ADD COMMENT • link 4.4 years ago by WouterDeCoster 47k

0

Entering edit mode

Yes – good point. I think rare disease carrier status is an excellent example of a robust genomics application.

I also checked my APOE status, but I also did some extra research to see if I could understand more about what data is being used for the risk associations. I also have somewhat similar blog posts for moderate-to-high risk cancer genes, in terms of population frequency and/or risk estimates.

ADD REPLY • link 4.4 years ago by Charles Warden 8.2k