3.5 years ago by
The commenters are correct re: first figuring out something you want to do with the data, and checking what input formats work with that.
With that said, here are two conversions that might come in handy.
1. 23andMe to VCF: this is now supported by PLINK (https://www.cog-genomics.org/plink2/input#23file ).
plink --23file [name of your file] --snps-only no-DI --recode vcf
What's "--snps-only no-DI", you ask? Well, 23andMe files contain mostly SNP calls, but there are a few indel calls as well. Unfortunately, the actual bases involved in the indels are NOT saved; instead, there's just 'D' for deletion and 'I' for insertion, and you'd need an indel database to determine a valid VCF representation of the call. So we just punt here and filter out all markers with 'D' or 'I' allele codes.
2. VCF to FASTA:
This can be done with a combination of VCFTools (http://vcftools.sourceforge.net/ ) and a Perl script. See http://code.google.com/p/vcf-tab-to-fasta/ for details.