Question: Hwo to do quality control steps on UKBiobank data?
15 months ago
anamaria110 wrote:


I downloaded imputed .bgen and .sample files from UKBiobank and now I am planning to do GWAS in it. I plan to use Plink2.

can you please tell me which QC steps I would have to perform?

I was thinking to do these:

-remove related individuals 
-remove non EUR
-remove SNPs with minor allele freq < 0.001
-model using ancestry info

Is there is some standard pipeline to do this in Plink2 or some related files from UKBiobank?

Thanks Ana

ukbiobank • 470 views
written 15 months ago by anamaria110

Entirely depends on the study. I would recommend thinking about what you're trying to accomplish and exploring the literature for meaningful filtering approaches depending on what you want to do.

written 15 months ago by Brice Sarver3.5k


yes I agree, and I mentioned above those 4 QC steps I plan to do. My question is more how to do this in Plink2? or some other software?

For example to deal with MAF I would do this: plink2 --bgen ukb_imp_chr17_v3.bgen ref-first --sample ukb44316_imp_chr17_v3_s487317.sample --maf 0.001 --make-bpgen --out chr17

But I don't know how to deal with the rest of 3 QC steps. Also I should mentioned this is imputed data from UKBiobank.

Thanks Ana

written 15 months ago by anamaria110

Hi! Did you manage to solve this?

written 8 months ago by catarinaglmg10
8 months ago
New York
Sam3.2k wrote:

I have a rough Nextflow pipeline for this. You can find the scripts here

You can read the help message to see what file you need and you can read the script to see what actually did the script does.

You will also need the GreedyRelated program I wrote to run the script, which can be found here

written 8 months ago by Sam3.2k
