How to phase my own 23andMe data ?
1
1
Entering edit mode
4.4 years ago
LouisJoubert ▴ 10

Hi, I am new this type of analysis but I am really interested in learning.

I did a DNA genotyping using 23andMe and I want to play a little bit with my data. My final goal would be to compare my personal data to population dataset, so I am taking it step by step.

So my question(s) are:

1) do I need to phase my data to perform what I want afterwards ?

2) if so how can I do this ? Using SHAPEIT or else ?

Thanks in advance to those taking the time to answer me.

Louis

phasing 23andme SNP • 1.7k views
ADD COMMENT
4
Entering edit mode
4.4 years ago

You'll need some additional data, like 1000 Genomes data.

While I am limited in my ability to provide assistance for the code, I have some examples of code to accomplish that here:

https://github.com/cwarden45/DTC_Scripts/tree/master/23andMe/Ancestry_plus_1000_Genomes

and here:

https://github.com/cwarden45/DTC_Scripts/tree/master/Genes_for_Good/RFMix_ReAnalysis

The RFMix portion is also based upon this code:

https://github.com/armartin/ancestry_pipeline

I would actually recommend Alicia Martin's code in terms of being easier to read (although I have some pointers on things that I was confused about in the Issues portion):

https://github.com/armartin/ancestry_pipeline/issues?utf8=%E2%9C%93&q=

ADD COMMENT
0
Entering edit mode

Thanks, I will take a look at that !

ADD REPLY
0
Entering edit mode

Hi again,

How do you convert your raw 23andMe data to vcf ? Did you remove duplicates ?

What are the differences between the vcf.gz file you use in your scripts (ALL.chip.omni_broad_sanger_combined.20140818.snps.genotypes.vcf) and the files used in Alicia Martin's code (the same used in Kevin Blighe tutorial) ?

Thanks

ADD REPLY
0
Entering edit mode

plink can accept the 23andMe file format (and there are probably other ways that others can perform the conversion). However, I essentially wrote some custom code (and excluded indels, where I didn't know the REF and VAR sequences).

So, I am not actually saying you should directly use my code. However, you should take time to understand all the steps to conversion, and I am OK with you getting some pointers from the code (although I would appreciate an acknowledgement, if you do that). From my end, I also realize that there is a limitation in what support I can provide, and therefore how much credit I can/should receive.

As for your question about which 1000 Genomes sample was used, I selected the Omni array since the uncompressed version was much smaller than an Illumina-sequencing-based multi-sample .vcf.

I don't think Alicia added any new samples, so she didn't have to worry about combining files in different formats (which is unfortunately directly related to your question - however, the combination of code may help with creating a combined file that is compatible with downstream analysis).

ADD REPLY

Login before adding your answer.

Traffic: 1876 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6