plink2 bgen to vcf ukbiobank
2
2
Entering edit mode
4.9 years ago
richyanicky ▴ 30

Hello

I have imputed data from ukbiobank in bgen format. I would like to convert it to a vcf file.

I can use plink2 to make pgen files and then use plink2 again to create a vcf

plink2 --bgen ukb_imp_chr17_v3.bgen --sample ukimp_chr17_v3_s.sample --make-pgen

plink2 --pgen plink2.pgen --pvar plink2.pvar --psam plink2.psam  --export vcf

This creates a vcf file but it doesn't seem to process in any of our pipelines.

  1. Does what I did look correct?
  2. How do I check the vcf file for accuracy?

Thank you in advance ,

Richard

software-error plink plink2 • 13k views
ADD COMMENT
0
Entering edit mode

Hi Richard,

Did you figure out what was the problem? I am having the same issue.

Thanks!

ADD REPLY
0
Entering edit mode

Yes I used the comment below by chrchang253 and it worked..

ADD REPLY
0
Entering edit mode

Further question for Chris here: for the UK Biobank BGEN data, what's the proper REF/ALT mode?

Warning: No --bgen REF/ALT mode specified ('ref-first', 'ref-last', or 'ref-unknown'). This will be required as of alpha 3.

ADD REPLY
2
Entering edit mode

The alpha 3 error message explicitly notes that UK Biobank BGENs use 'ref-first' encoding.

ADD REPLY
4
Entering edit mode
4.9 years ago

Assuming you want dosage information in your VCF, you need to replace "--export vcf" with something like "--export vcf vcf-dosage=DS". You may also want to add the 'bgz' modifier to request bgzipping of the VCF file.

ADD COMMENT
5
Entering edit mode
4.0 years ago
bcole ▴ 50

Note that you can do this in one step if you want, e.g.

plink2 --bgen ukb_imp_chr21_v3.bgen --sample ukb_imp_chr21_v3_s.sample --export vcf vcf-dosage=DS

Quick note about converting UK Biobank BGEN to VCF - I first tried this using QCTOOL and after 15 days only ~1.1 million lines from each chromosome had written. At that pace it would take ~15 weeks to convert chromosome 1 to VCF.gz using QCTOOL.

I then saw this post and installed a plink2 module on my HPC and was able to convert chr21 UKBB BGEN to VCF (not bgz) in 98 minutes! This means that plink2 seems to convert BGEN to VCF many, many times faster than QCTOOL.

As Chris points out, just the chr21 (the smallest autosome) VCF file from UKBB was 2.4TB, so you should definitely consider using the bgz modifier to reduce file size.

ADD COMMENT
0
Entering edit mode

I want to add --minimac3-r2-filter 0.9 into your command. Do you think it will work?

ADD REPLY
0
Entering edit mode

Dear Bcole,

Not sure if you will read this but I used your code to convert my data from bgen to vcf, unfortunately I get this error:

(imlabtools) [s1997351@node2f24(eddie) Bgen_vcf_test]$ plink2 --bgen crcsurvival_chr1.bgen --sample crcsurvival_chr1.sample --export vcf vcf-dosage=DS
PLINK v2.00a2.3LM 64-bit Intel (24 Jan 2020)   www.cog-genomics.org/plink/2.0/
(C) 2005-2020 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to plink2.log.
Options in effect:
  --bgen crcsurvival_chr1.bgen
  --export vcf vcf-dosage=DS
  --sample crcsurvival_chr1.sample

Start time: Wed Apr  6 08:34:05 2022
Warning: No --bgen REF/ALT mode specified ('ref-first', 'ref-last', or
'ref-unknown').  This will be required as of alpha 3.
385228 MiB RAM detected; reserving 192614 MiB for main workspace.
Allocated 14461 MiB successfully, after larger attempt(s) failed.
Using up to 32 threads (change this with --threads).
--bgen: 640921 variants detected, format v1.2.
Error: Invalid categorical phenotype '0' on line 3, column 5 of .sample file
(positive integer < 2^31 or --missing-code value expected).
End time: Wed Apr  6 08:34:05 2022

Do you, or anyone else, have any idea what I need to do differently?

Thanks a lot!

ADD REPLY
0
Entering edit mode

This is a newer type of .sample file; you need to update to a plink2 build from August 2020 or later to import it.

ADD REPLY
0
Entering edit mode

you need to add REF/ALT mode in your command to tell plink2 which allele is your REF, usually the first one, so you add 'ref-first' in your command line 'plink2 --bgen ukb_imp_chr21_v3.bgen ref-first --sample ukb_imp_chr21_v3_s.sample --export vcf vcf-dosage=DS', or just choose 'ref-unknow' to let plink2 to find.

ADD REPLY
0
Entering edit mode

Hi, I used your code to convert my data from bgen to vcf on the RAP of Uk biobank, I also get the error.

The code I used is as following:

plink2 --bgen ukb21008_c11_b0_v1.bgen 'ref-last' --sample ukb21008_c11_b0_v1.sample --make-bed --out ukb21008_c11_b0_v1

enter image description here

I am really confused, have you or anyone meet this problem?

ADD REPLY

Login before adding your answer.

Traffic: 1700 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6