Question: How to convert vcf to 23andme format
0
gravatar for alec_djinn
18 months ago by
alec_djinn320
European Union
alec_djinn320 wrote:

I have a vcf file (format VCFv4.0), generated by GATK pipeline starting from Illumina reads.

I need to convert it to 23andme file format. Example of the 23andme format:

# rsid  chromosome  position    genotype

rs4477212   1   82154   TT

rs3094315   1   752566  TC

rs3131972   1   752721  AA

rs12124819  1   776546  AC

I am having problems with plink2 --recode 23 cannot be used with multi-char alleles. Plink was recommended earlier here C: Conerting vcf to 23andMe format

I tried then to modify the vcf to remove multi-char alleles using VcfMultiToOneAllele, which did a great job but the output file, even though it looks like a vcf, it was not recognised as such by plink2 no genotype data in .vcf file. Any other tool up to the task?

Thanks for any help.

sequencing snp genome • 2.9k views
ADD COMMENTlink modified 18 months ago • written 18 months ago by alec_djinn320
1
gravatar for alec_djinn
18 months ago by
alec_djinn320
European Union
alec_djinn320 wrote:

OK, it seems I have solved it using:

plink2 --vcf [vcf file] --snps-only --recode 23

now thinking how to include single point deletions and insertions to the output file, because those are missing

ADD COMMENTlink modified 18 months ago • written 18 months ago by alec_djinn320
2
gravatar for Philipp Bayer
18 months ago by
Philipp Bayer6.0k
Australia/Perth/UWA
Philipp Bayer6.0k wrote:

That doesn't look like a vcf file to me - 'multi-char alleles' appear when you have more than one alternative allele, which should be impossible if it's for a single human like 23andMe files are. Are you sure your example is your vcf file?

It should look like:

##fileformat=VCFv4.0
##fileDate=20090805
##source=myImputationProgramV3.1
##reference=1000GenomesPilot-NCBI36
##phasing=partial
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency">
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">
##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">
##FILTER=<ID=q10,Description="Quality below 10">
##FILTER=<ID=s50,Description="Less than 50% of samples have data">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">
#CHROM POS     ID        REF ALT    QUAL FILTER INFO                              FORMAT      NA00001        NA00002        NA00003
20     14370   rs6054257 G      A       29   PASS   NS=3;DP=14;AF=0.5;DB;H2           GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,.
20     17330   .         T      A       3    q10    NS=3;DP=11;AF=0.017               GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3   0/0:41:3
20     1110696 rs6040355 A      G,T     67   PASS   NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2   2/2:35:4
20     1230237 .         T      .       47   PASS   NS=3;DP=13;AA=T                   GT:GQ:DP:HQ 0|0:54:7:56,60 0|0:48:4:51,51 0/0:61:2
20     1234567 microsat1 GTCT   G,GTACT 50   PASS   NS=3;DP=9;AA=G                    GT:GQ:DP    0/1:35:4       0/2:17:2       1/1:40:3
ADD COMMENTlink written 18 months ago by Philipp Bayer6.0k

The example that I posted - was an example of 23and me format, to which I want to convert my vcf file.

This is a part of my vcf file that was recognised by plink2 as containing multi-char allele:

  #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  AM1
    1       814264  .       C       A       1121    PASS    BRF=0.27;FR=1.0000;HP=1;HapScore=1;MGOF=58;MMLQ=29;MQ=42.01;NF=19;NR=10;PP=1066
    1       814297  .       TGCT    ACTA    256     alleleBias      BRF=0.23;FR=0.5000;HP=1;HapScore=1;MGOF=71;MMLQ=25;MQ=45.1;NF=3;NR=0;P
    1       814371  .       GTGTT   C       1093    PASS    BRF=0.2;FR=0.4000;HP=2;HapScore=2;MGOF=47;MMLQ=28;MQ=41.43;NF=17;NR=7;PP=993;Q
ADD REPLYlink modified 18 months ago • written 18 months ago by alec_djinn320
1

Aah I see - have you tried removing all lines where there are indels, i.e., where the ALT field has more than one letter (here: ACTA)? I don't think 23andMe has those

ADD REPLYlink written 18 months ago by Philipp Bayer6.0k

I actually want that info as well.

ADD REPLYlink written 18 months ago by alec_djinn320
2
gravatar for Shab86
18 months ago by
Shab86240
Helsinki
Shab86240 wrote:

There are a couple of ways to convert 23andme dataset to vcf:

  1. Download 23andme dataset as a tab-delimited file with just these columns: the marker ID, chromosome name, position, and the genotype. Then use bcftools to convert the tsv file above to vcf by this:

bcftools convert --tsv2vcf input.gz -f ref.fa -s SampleName -Ov -o out.vcf

  1. Another method would be to use your own scripts like for example this one in github: https://github.com/arrogantrobot/23andme2vcf

Also, It seems you have multiallelic sites in the 23andme dataset. Many software don't work well with that and one convention is to throw them away or to break them into single allelic sites. A useful tool here is bcftools for resolving the multi-allelic sites:

bcftools norm -m - input.vcf -o out.vcf

Finally, are you analyzing population level data? If not, why do you have multi-character alleles?

ADD COMMENTlink modified 18 months ago • written 18 months ago by Shab86240

Thank you for your comment, but I actually need to convert it the other way around. I have a vcf file generated by GATC pipeline starting from Illumina reads. and I need to convert it to 23andme file format shown above.

ADD REPLYlink modified 18 months ago • written 18 months ago by alec_djinn320

I have also tried the script 23andme2vcf, it generates 23anMe file format but the last column (genotype) is empty :(

ADD REPLYlink written 18 months ago by alec_djinn320

Ahh, my mistake in interpreting it the other way around. Have you tried this: https://github.com/2sh/vcf-to-23andme

ADD REPLYlink written 17 months ago by Shab86240
1
gravatar for chrchang523
18 months ago by
chrchang5234.8k
United States
chrchang5234.8k wrote:

There are two problems here.

  1. The 23andMe format does not support multi-character alleles; you must reorganize your data so that none of these remain. Split length-preserving multi-nucleotide variants into a bunch of single-nucleotide variants. (As for length-changing variants, 23andMe has historically represented some common insertions with "I", some common deletions with "D", and thrown out everything else. This requires you to write a script to postprocess the VCF file, and is unlikely to be worth the trouble.)

  2. The example data you posted is missing the rightmost two columns ("FORMAT" and the actual sample data). Assuming they exist and just failed to be copy/pasted, the errors reported by plink and other programs imply that there is no "GT" field at the beginning of the "FORMAT" column; that's the standard way of representing the actual data you want to convert to 23andMe-format. You need to figure out how to add a sufficiently-accurate GT field to your VCF.

ADD COMMENTlink modified 18 months ago • written 18 months ago by chrchang5234.8k

Yes, it was a copy-paste error, now fixed. Thank you for your suggestion. I would've preferred if there was a ready to use tool. If not, yes I am going to code it by myself.

ADD REPLYlink written 18 months ago by alec_djinn320
0
gravatar for maria.vazquez
18 months ago by
maria.vazquez0 wrote:

Hi, at Gencove we just launched an open and free API with tools that allows users to upload almost any type of DNA file (23andMe, Ancestry, FTDNA, etc). Feel free to test as user too. We give back a vcf too.

www.gencove.com/researchers

Let me know if you have any question.

ADD COMMENTlink written 18 months ago by maria.vazquez0
1

OP is asking for a conversion from VCF to 23andMe format. Can your tool do this?

ADD REPLYlink written 18 months ago by genomax64k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2312 users visited in the last hour