Question: bcf convert 23andme to vcf
0
gravatar for stuartkkim
20 months ago by
stuartkkim0
stuartkkim0 wrote:

Hi,

I need to convert a 23andme file to vcf using bcf. The command is:

bcftools convert --tsv2vcf input.tab.gz -f ref.fa -s SampleName -Ob -o sample.bcf

I have a 23andme.txt file.

What do I use for "input.tab.gz"; can I use the 23andme.txt file or do I need to convert it first?

What do I use for "ref.fa"? Where can I get a ref.fa file for build 37?

Is "SampleName" just the name of the individual in the 23andme file?

I used plink to input the 23andme file and --recode vcf. The problem is that there is no ALT allele if the genotype is homozygous. Is there a way to insert the ALT allele? If not, then the plink solution does not help.

Thanks

snp bcftools • 1.4k views
ADD COMMENTlink modified 13 months ago by Gabriel R.2.8k • written 20 months ago by stuartkkim0

Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.
code_formatting

ADD REPLYlink written 20 months ago by _r_am31k

New issue arose:

bcftools convert --tsv2vcf input.tab.gz -f ref.fa -s SampleName -Ob -o sample.bcf

input.tab.gz is a 23andme.txt file that is version 2 and build 36.

ref.fa is Homo_sapiens.GRCh37.dna.primary_assembly.fa.gz.

So the ref fa is build 37 and the 23andme file is build 36. Where can I get a ref fa file for build 36? I can not find one archived at ENSEMBL.

thanks. Stuart

ADD REPLYlink written 19 months ago by stuartkkim0

You can find hg16 human genome build here: https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.10/

ADD REPLYlink modified 19 months ago • written 19 months ago by GenoMax92k
2
gravatar for Emily_Ensembl
20 months ago by
Emily_Ensembl21k
EMBL-EBI
Emily_Ensembl21k wrote:

input.tab.gz is your input from 23AndMe. It's expecting a zipped file, so you may wish to zip it. Just check that your file is in the format on the bcftools page, eg:
rs6139074 20 63244 AA rs1418258 20 63799 CC rs6086616 20 68749 TT rs6039403 20 69094 AG

You can get a reference FASTA for GRCh37 from Ensembl.

The Sample Name is whatever you want to call it. That's what's going to appear in the genotype header in the VCF, so it's up to you.

ADD COMMENTlink written 20 months ago by Emily_Ensembl21k

Thanks Emily!!! That worked.

ADD REPLYlink written 19 months ago by stuartkkim0

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.
Upvote|Bookmark|Accept

ADD REPLYlink modified 19 months ago • written 19 months ago by GenoMax92k
0
gravatar for Gabriel R.
13 months ago by
Gabriel R.2.8k
Danmarks Tekniske Universitet
Gabriel R.2.8k wrote:

It is one line in glactools, the example in the test/ folder:

glactools 23andme2acf --epo epochr1.gz  --fai human_g1k_v37.fasta.fai smallPublic23andMeData.gz anon  |  glactools glac2vcf -

Probably you should replace epochr1.gz for all.epo.gz in real data.

ADD COMMENTlink written 13 months ago by Gabriel R.2.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2157 users visited in the last hour