Question: How to I convert 23andMe Raw Genome to GenBank or FASTA?
7
gravatar for someashole
3.5 years ago by
someashole70
university of delaware
someashole70 wrote:

I used 23andMe to download my raw genome. I have it in a .txt file but you can't use the format for real bio programs. i want to make my own library for further analysis. Does anyone know how i can convert .TXT to FASTA, GenBank, or any other usable file type?

 

convert help how to 23andme • 9.3k views
ADD COMMENTlink modified 8 months ago by missstrawchewwer0 • written 3.5 years ago by someashole70

Can you provide an example of what your data looks like? Then if you could provide an example of what you want the output to look like that would also help. 

ADD REPLYlink written 3.5 years ago by Jason810

snp/rs id     chrm #  position    genotype

rs4477212    1         82154      AA
rs3094315    1         752566    AG
rs3131972    1         752721    AG
rs12124819  1         776546    AA

 

there are about 960k lines 

 

ADD REPLYlink written 3.5 years ago by someashole70

I'm not sure how the data should look for a usable format

ADD REPLYlink written 3.5 years ago by someashole70
2

The output that you get above is already the most compact form that you can get your data in. It represents the differences relative to the reference genome.

You could for example transform this to two diploid genomes in FASTA format but do you realize that your files would then be gigantic ones of many gigabytes and these files would not show you what the the changes were. 

The right way to go about this is to formalize what do you want to do next with your data. Then depending on that aim people here can advise what to transform it to.

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by Istvan Albert ♦♦ 74k
1

Could you convert this to FASTA though? Are the genotype alleles listed so that one sister chromosome is always first and the other is always second? Or is it random? If it's random, there's no way to construct FASTA because we don't know if, for example at 752566 and 752721 we have A-A and G-G or A-G and G-A.

If I was going to do anything with it, I think I'd want a VCF file.

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by Emily_Ensembl13k
7
gravatar for chrchang523
3.5 years ago by
chrchang5232.8k
United States
chrchang5232.8k wrote:

The commenters are correct re: first figuring out something you want to do with the data, and checking what input formats work with that.

With that said, here are two conversions that might come in handy.

1. 23andMe to VCF: this is now supported by PLINK (https://www.cog-genomics.org/plink2/input#23file ).

plink --23file [name of your file] --snps-only no-DI --recode vcf

What's "--snps-only no-DI", you ask?  Well, 23andMe files contain mostly SNP calls, but there are a few indel calls as well.  Unfortunately, the actual bases involved in the indels are NOT saved; instead, there's just 'D' for deletion and 'I' for insertion, and you'd need an indel database to determine a valid VCF representation of the call.  So we just punt here and filter out all markers with 'D' or 'I' allele codes.

2. VCF to FASTA:

This can be done with a combination of VCFTools (http://vcftools.sourceforge.net/ ) and a Perl script.  See http://code.google.com/p/vcf-tab-to-fasta/ for details.

ADD COMMENTlink written 3.5 years ago by chrchang5232.8k

Once you have the VCF, check the answers in this discussion: New Fasta Sequence From Reference Fasta And Variant Calls File?

ADD REPLYlink written 3.5 years ago by Giovanni M Dall'Olio25k

And my blog post on conversion to VCF for use with the Ensembl Variant Effect Predictor.

ADD REPLYlink written 3.5 years ago by Neilfws47k
0
gravatar for jeffreyice1105
20 months ago by
jeffreyice11050 wrote:

This issue I have with the 23file is that I get invalid chromosome code 85280 on line 585542 of .bim file.

(Use --allow-extra-chr to force it to be accepted.)

But when I add that it tells me you cant use --allow-extra-chr as it cannot currently be with --23file.

Anybody now how to fix this?

Jeff

ADD COMMENTlink modified 20 months ago • written 20 months ago by jeffreyice11050
0
gravatar for missstrawchewwer
8 months ago by
missstrawchewwer0 wrote:

Jeff, try this:

plink --23file 23andmefile.txt Surname Firstname Sex --snps-only no-DI --make-bed --out plink_genome

ADD COMMENTlink written 8 months ago by missstrawchewwer0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 570 users visited in the last hour