23andme file format
1
3
Entering edit mode
5.1 years ago
bkellman ▴ 30

What (if any) is the official file format name that 23andme uses. I've only seen it referred to as 23andme format. Are they complying with a more general format? If so, what is it? If not, why did they invent a new format?

23andme vcf variant format • 7.6k views
ADD COMMENT
2
Entering edit mode

If not, why did they invent a new format?

An oldie but goodie…

enter image description here

ADD REPLY
0
Entering edit mode

What if the variant is a deletion or insertion, particularly of more than a nucleotide? How could you differentiate an insertion of -/TT from a T/T snp? Anybody has got a real sample file to look at?

ADD REPLY
0
Entering edit mode
5.1 years ago
Tonor ▴ 470

According to this page: http://fileformats.archiveteam.org/wiki/23andMe

Raw genetic data is provided in the form of a tab delimited file (ZIPped up for distribution), containing the fields rsid, chromosome, position, genotype (e.g. rs3094315 1 742429 AG).

Comment lines begin with the # character.

The file name is of the form genome_Firstname_Lastname_20012345678901.txt, zipped as genome_Firstname_Lastname_20012345678901.zip.

ADD COMMENT
0
Entering edit mode

Yes, that is the information contained in the file. I'm asking if the file format has a name: vcf, bam, sam...

ADD REPLY
2
Entering edit mode

No, this is not one of the standard NGS formats, but it should be relatively straightforward to parse into VCF format (you might have to add dummy values for some of the fields). See here for VCF specs.

ADD REPLY
0
Entering edit mode

Parsing to VCF is indeed straightforward. I was hoping to dev an app using 23andme data but to get the data into the environment it needs to be in a known NGS format; I was hoping it was. Rather than asking those operating the platform to accept this very specific datatype, I was hoping to ask to broader, more general question of "can we include *.xxx type files." I'll see what they say. Thanks

ADD REPLY
1
Entering edit mode

The file name is of the form genome_Firstname_Lastname_20012345678901.txt, zipped as genome_Firstname_Lastname_20012345678901.zip.

It is just a .txt text file

ADD REPLY
1
Entering edit mode

What if the variant is a deletion or insertion, particularly of more than a nucleotide? How could you differentiate an insertion of -/TT from a T/T snp? Anybody has got a real sample file to look at?

ADD REPLY
0
Entering edit mode

Don't think there is an official name for that particular style.

ADD REPLY

Login before adding your answer.

Traffic: 1380 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6