create plink files from 23andMe JSON files
2
0
Entering edit mode
9.6 years ago

Hi, I have several 23andMe files in JSON format that I want to merge in order to create a ped and a map file set for my plik analyses.

Is there any existing tool that does this job?

(I'm trying to avoid the 4-column 23andMe format because some individuals are missing some of the SNPs).

All the best,

Yorgos

plink JSON SNP 23andMe • 3.8k views
ADD COMMENT
0
Entering edit mode

Geia sou Yorgo,

Can you post a short example of a unit of data from the JSON file, and a brief example of the output that you would like to get from that sample data unit?

ADD REPLY
0
Entering edit mode

Geia sou Deedee,

JSON files are really huge but they look like this:

{"id": "some_id_label", "genome":"__AAGGAAAAAAAAAA__AA__GGAAAA__AAAAAAAA__AAAAAA__AAAAAAAAAAAA__AAAA__AAAA____AA__AAAACCTTTT__CC__CC__CCCCAA____CCTT____TTCC__CC________CC____CCCCCCCCCC______GGGG__GGGGGGGG__GGGGAA__GGAAGGGGGGGGGGGGGG____GG____GGGGAAGGGGGG__GGGGGGGGGGGGGGGGGGGGGGGG__AAGG__TTTTTTTTTTTTTT__CCTTTTTTTTTTTT__CCTTTTTTTTTTTTTTTT__TTTT__TTCC__TTTT__TTTT__TTTTTTTTTT______TT____TTAG__GTGGGTTTTTCTAG__GGAG__AA____CTGG__CCCC__TTTTCCTTAGCCAG__CTCTCCTTAA__CTGTCCCTCCAGCCGGAA__CTGGCCTT__CCAATTCCCT__GGTTAGAATTAATTGGACGGGGCC__GGTTTT__GGCTGG____AAAACTCTTTGGCC____AAGGCCAAAGTTCT__AACC__AACCTTCT__AA__TTAAAAGTAACTAGAGAGAGCCTTCT______CC__AA__ACGG__TTGGGG__GGTTCC__AAACGGTTGG____GGCTGGCCTT__AACTAGAATTCCTTGG__AG__GGTTCCCCCCAGAACTAATTAGAG__GGCC__GGACCTAAAACTGGTTTTCTCCCTCC__CTAACTTTAG______TTAA__AAAGGG__CC__GGGGAA__AAAA__GGCT____GG__AA____AAAGCTAATT

Each pair of letters corresponds to one locus (mostly SNPs but sometimes also indels). Double underscore corresponds to missing genotype. We need a MAP file to understand the JSON files correctly.

PED files include the following fields (one line per individual):

Family_ID Subject_ID Father_ID Mother_ID Sex Disease_Status SNP1_allele1 SNP1_allele2 SNP2_allele1 SNP2_allele2 etc...

MAP files include the following fields:

Chromosome SNP_ID Genetic_discance BP_position

(genetic discance is irrelevant and can be set to 0).

I was hoping that there might be some statistical package or tool that does this job instead of having to write code from scratch.

All the best,

Yorgos

ADD REPLY
0
Entering edit mode

I see. So if "id" and "genome" are the only two properties for each data unit, then it's obviously not a translation of key-value pairs to a flat table.

I don't know of any tool that can do the processing work, but I'll check around as soon as I have time. Thanks for uploading that!

ADD REPLY
2
Entering edit mode
9.6 years ago
Kizuna ▴ 870

Hi,

I do not know what you mean by 4-column 23andMe format, but here a thing you can do with R in order to go from.JSONto a merged Dataframe (called in this example:dfList) that you can use after to construct your .pedand .mapfiles (I think .ped and .mapare tab delimited txt files) :

install.packages("jsonlite")
install.packages("plyr")
library(jsonlite); library(plyr)

file1<- fromJSON("/../.JSON") # you can do a for loop here to not enter all your files manually (file2, file3,..)
dfList= list(file1,file2,....) # make all your files a list named dfList
merged.file=join_all(dfList) # merge them all based on common lines.

Once joined you can manipulate these files to create a .pedand .map file.

hope this would help !

Kiz

ADD COMMENT
0
Entering edit mode

Thank you Kizuna! I will try this!

Yorgos

ADD REPLY
1
Entering edit mode
9.6 years ago

If you don't have accompanying key files for the JSONs, you'll probably need to re-grab the genomic data. See here; that includes a link to the current genome-string-index-to-variant-info file, but it's updated every once in a while. Since you mention that some of your JSONs are missing some SNPs, it sounds like they aren't all compatible with the current key.

If you do re-grab the data, choose 4-column format if at all possible since PLINK 1.9 explicitly supports it: https://www.cog-genomics.org/plink2/input#23file

ADD COMMENT

Login before adding your answer.

Traffic: 2680 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6