Plink 2 user error converting .bgen to bcf
1
0
Entering edit mode
2.2 years ago
Salahuddin • 0

Hi,

I have 22 chromosome .bgen files separately. For PRS, i want to merge them using -pmerge. For that I need bcf formats. So basically I'm at 1st stage, converting .bgen to bcf and quality control. But after giving command, I'm getting error message -

Error: Only VCF, oxford, bgen-1.x, haps, hapslegend, A, AD, A-transpose, and ind-major-bed output have been implemented so far.

My command -

plink2 \
--bgen /mnt/d/ukb_genetic_data/ukb_imp_chr1_v3.bgen ref-first \
--sample /mnt/d/ukb_genetic_data/ukb30172_imp_chr22_v3_s487324.sample \
--maf 0.01 \
--hwe 1e-6 \
--geno 0.01 \
--mind 0.01 \
--export bcf \
--out chr101

Thanks for your help.

bcf Plink bgen2 • 4.3k views
ADD COMMENT
0
Entering edit mode

What exactly do you want to do? You want to merge the bgen file? Or do you want to convert bgen files into VCF formats? Most PRS software I know of does not support bcf / vcf formats.

ADD REPLY
0
Entering edit mode

For PRS analysis, I want to merge the bgen files.

ADD REPLY
0
Entering edit mode

Hey Sam, Maybe it will help more, If I write in details. This is my first time using PRSice. I want to do PRS on UKB dataset of a specific trait. I have 22 .bgen, 22 .bgi and 1 sample file (1 for each chromosome = 22) in hand. To make the target file ready I understood, I need to quality control these 22 .bgen file and merge them into one file. To merge the files, I found (https://groups.google.com/g/plink2-users/c/hkgxoyjBbWg/m/4YTauyQrAgAJ) i need to do -pmerge. Correct me if I'm wrong.

ADD REPLY
0
Entering edit mode

If you are using PRSice, you can do --target ukb_chr#_v3 and we can automatically loop through the bgen files. As for QC, as you are doing UKB, there should be a mgi or mxx file (I forgot the suffix) that contain the MAF and INFO score which you can use for filtering. Though I'd recommend first running the PRS analyses with the genotype file as most of the time, imputed data does not add substantial amount of signal, but takes way longer to run.

ADD REPLY
0
Entering edit mode

Thanks for your reply. Using --target command is a good idea. But to be clear, do I not need to merge the files then? So I have Quality controlled 22 files and all generated files 22 copies all the way PRS analysis?

ADD REPLY
0
Entering edit mode

PRSice will automatically loop through all 22 chromosomes, if that is what you meant. You can just provide a single SNP list containing the SNP ID of those that passed QC.

ADD REPLY
0
Entering edit mode

Thanks for your reply. I'm a little confused with the QC step. From PRSice manual (https://www.prsice.info/quick_start/#quality-control-of-target-samples) I came to understand, the target samples need to be quality controlled using PLINK with following command plink --bfile ($target) \ --maf 0.05 \ --mind 0.1 \ --geno 0.1 \ --hwe 1e-6 \ --make-just-bim \ --make-just-fam \ --out ($target).qc Then, --keep ($target).qc.fam --extract ($target).qc.bim can be added to the PRSice command to filter out the samples and SNPs.

Is there other alternative that you are suggesting? With UKB application, I only have bgen, bgi and sample file.

ADD REPLY
0
Entering edit mode

If you are using UK Biobank, the minor allele frequency, info score and all those information were already computed and can be downloaded directly from here Therefore, you don't really need to compute the data yourself.

ADD REPLY
0
Entering edit mode

I have downloaded the file containing MAF and INFO score ukb_imp_mfi.tgz file for filtering as you mentioned above for QC. Could you mention the command script for the filtering? Currently I'm running the following script for only PRS construction without regression -

Rscript PRSice.R --dir . ^ --prsice PRSice_win64.exe ^ --base iPSYCH-PGC_ASD_Nov2017.gz ^ --snp SNP ^ --chr CHR ^ --bp BP ^ --A1 A1 ^ --A2 A2 ^ --stat OR ^ --pvalue P ^ --target ukb_imp_chr#_v3,ukb30172_imp_chr22_v3_s487324.sample ^ --type bgen ^ --no-regress ^ --thread 1 ^ --stat OR ^ --extract PRSice.valid ^ --binary-target T ^ --bar-levels 0.001,0.05,0.1,0.2,0.3,0.4,0.5,1 ^ --allow-inter ^ --fastscore ^ --score avg ^ --interval 5e-05 ^ --lower 5e-08 ^ --upper 0.5 ^ --missing mean_impute ^ --model add ^ --print-snp ^ --out trial_140222

ADD REPLY
0
Entering edit mode

You will need to filter out the require SNP names, then use --extract or --exclude from PRSice to do the filtering.

ADD REPLY
0
Entering edit mode

How do I filter out the require SNP names?I tried with PRSice with following command:

Rscript PRSice.R --dir . ^ --prsice PRSice_win64.exe ^ --target ukb_imp_chr#_v3,ukb30172_imp_chr22_v3_s487324.sample ^ --type bgen ^ --geno 0.02 ^ --maf 0.01 ^ --info 0.7 ^ --print-snp ^ --out ukb.QC

But I get error asking for Base file. After including the base file, it seems to start working on PRS score calculation. Could you help me out with the step by step commands. As I understood, with one command i do filter out SNP names, and then then use --extract or --exclude from PRSice to do the filtering. Thanks

ADD REPLY
0
Entering edit mode

Use awk or R on the mfi file. Identify any row with INFO score and MAF larger than desired threshold. Then you can write it out to a file. e.g. awk '$6 > 0.05 && $6 < 0.95 && $8 > 0.8{print $2}' mfi > valid.snp Will filter out SNPs with MAF > 0.5 ($6) and info score > 0.8 ($8) and print out the rs id ($2). You can find the header information here. (Note that, my copy of mfi seems to have slightly different format than what was shown in the document. So do change the column number according to the format of your mfi file)

ADD REPLY
0
Entering edit mode

Thanks a lot. It worked. Now I have this issue with --extract command. I have three commands with --extract now; one for the files with QCed SNPs of target file, another with the list of valid SNPs after clumping and another from last PRSice run .valid list. Please see my log below followed by my query- PRSice 2.3.5 (2021-09-20) https://github.com/choishingwan/PRSice (C) 2016-2020 Shing Wan (Sam) Choi and Paul F. O'Reilly GNU General Public License v3 If you use PRSice in any published work, please cite: Choi SW, O'Reilly PF. PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data. GigaScience 8, no. 7 (July 1, 2019) 2022-02-22 13:52:04 D:\ukb_genetic_data\PRSice_win64.exe \ --a1 A1 \ --a2 A2 \ --allow-inter \ --bar-levels 0.001,0.05,0.1,0.2,0.3,0.4,0.5,1 \ --base iPSYCH-PGC_ASD_Nov2017.gz \ --base-info INFO:0.9 \ --binary-target T \ --bp BP \ --chr CHR \ --extract combined-csv-files.snp \ --fastscore \ --geno 0.02 \ --ignore-fid \ --interval 5e-05 \ --lower 5e-08 \ --missing mean_impute \ --model add \ --no-clump \ --no-regress \ --num-auto 22 \ --or \ --out trial_220222 \ --pvalue P \ --score avg \ --seed 140070291 \ --snp SNP \ --stat OR \ --target ukb_imp_chr#_v3,ukb30172_imp_chr22_v3_s487324.sample \ --thread 1 \ --type bgen \ --upper 0.5

Initializing Genotype file: ukb_imp_chr#_v3 (bgen) With external fam file: ukb30172_imp_chr22_v3_s487324.sample

Start processing iPSYCH-PGC_ASD_Nov2017

RS_ID assume to be column containing SNP ID

Base file: iPSYCH-PGC_ASD_Nov2017.gz GZ file detected. Header of file is: CHR SNP BP A1 A2 INFO OR SE P

9112386 variant(s) observed in base file, with: 3102224 variant(s) excluded based on user input 639495 variant(s) with INFO score less than 0.900000 805870 ambiguous variant(s) excluded 4564797 total variant(s) included from base file

Loading Genotype info from target

487409 people (223006 male(s), 264318 female(s)) observed 487409 founder(s) included

Error: A total of 8166 duplicated SNP ID detected out of 4560581 input SNPs! Valid SNP ID (post --extract / --exclude, non-duplicated SNPs) stored at trial_220222.valid. You can avoid this error by using --extract trial_220222.valid

I want to generate the score again now with clumped file (snp_140222.snp) extracting the valid target snp file (combined-csv-files.snp). how can I do that. Sorry for my long post and query.

ADD REPLY
0
Entering edit mode

Do I use -keep for the QCed target valid.snp or -extract? You mentioned before to use -extract.

ADD REPLY
0
Entering edit mode

For snp, use --extract. For samples, use --keep

ADD REPLY
0
Entering edit mode

Can I have more than one --extract in the command? As I have clumped file (snp_140222.snp) from last run of PRS scoring and the valid target snp file (combined-csv-files.snp) after QC. I also have a .valid file to extract when I give the command. Could you check my command below -

Rscript PRSice.R --dir . ^

--prsice PRSice_win64.exe ^

--base iPSYCH-PGC_ASD_Nov2017.gz ^

--snp SNP ^

--chr CHR ^

--bp BP ^

--A1 A1 ^

--A2 A2 ^

--stat OR ^

--pvalue P ^

--target ukb_imp_chr#_v3,ukb30172_imp_chr22_v3_s487324.sample ^

--type bgen ^

--no-clump ^

--no-regress ^

--thread 1 ^

--extract PRSice.valid ^

--binary-target T ^

--bar-levels 0.001,0.05,0.1,0.2,0.3,0.4,0.5,1 ^

--allow-inter ^

--fastscore ^

--score avg ^

--interval 5e-05 ^

--lower 5e-08 ^

--upper 0.5 ^

--missing mean_impute ^

--model add ^

--extract snp_140222.snp ^

--ignore-fid ^

--extract combined-csv-files.snp ^

--geno 0.02 ^

--out trial_220222

ADD REPLY
0
Entering edit mode

nope. You need to merge them / make one yourself

ADD REPLY
0
Entering edit mode

Thanks for you reply. So I make one valid snp list file by merging QCed target file snp list and clumped snp list from last run. Then --extract that file only. While merging /making, do I keep only the SNP names in that file or other information also required?

ADD REPLY
0
Entering edit mode

Hi Sam, Thanks for your support, I've been able to work it out. I have a question, if you could help me out. While PRScoring with BGEN imputed UKB data, I only had INFO, MAF and Geno filtering as QC. Is it enough QC steps? From your paper I see additional recommendation of 'genotyping rate >0.99' and 'Hardy-Weinberg Equilibrium P >1 × 10−6'. How to apply that while working with UKB BGEN data? And if not needed, can i get a reference to that. Thanks

ADD REPLY
0
Entering edit mode
2.2 years ago
  1. As a general rule, you should post the full .log file from your failing run.
  2. In this particular case, I can guess that you're probably using a plink2 build that's at least two years old, and should update to a newer one.
ADD COMMENT

Login before adding your answer.

Traffic: 1780 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6