Hi,
I need to convert a 23andme file to vcf using bcf. The command is:
bcftools convert --tsv2vcf input.tab.gz -f ref.fa -s SampleName -Ob -o sample.bcf
I have a 23andme.txt file.
What do I use for "input.tab.gz"; can I use the 23andme.txt file or do I need to convert it first?
What do I use for "ref.fa"? Where can I get a ref.fa file for build 37?
Is "SampleName" just the name of the individual in the 23andme file?
I used plink to input the 23andme file and --recode vcf
. The problem is that there is no ALT allele if the genotype is homozygous. Is there a way to insert the ALT allele? If not, then the plink solution does not help.
Thanks
Note that ALL of the solutions here have the same limitation as plink. It’s impossible to report an ALT allele if it simply isn’t in the data; you need an additional SNP database file. It’s only REF alleles that can be reliably filled in without that (with plink, you’d use plink2’s —ref-from-fa flag).
I thought the reply below does not have this problem, but it does. Where can we find the ALT alleles for 23andMe data?
I guess it doesn't really matter, as you only DON'T know the ALT allele when the genotype is REF/REF in the first place.
If you have several samples, you can guess the ref more reliably.
Please use the formatting bar (especially the
code
option) to present your post better. You can use backticks for inline code (`text` becomestext
), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.New issue arose:
input.tab.gz is a 23andme.txt file that is version 2 and build 36.
ref.fa is Homo_sapiens.GRCh37.dna.primary_assembly.fa.gz.
So the ref fa is build 37 and the 23andme file is build 36. Where can I get a ref fa file for build 36? I can not find one archived at ENSEMBL.
thanks. Stuart
You can find hg16 human genome build here: https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.10/