Hwo to do quality control steps on UKBiobank data?
1
0
Entering edit mode
2.5 years ago
anamaria ▴ 160

Hello,

I downloaded imputed .bgen and .sample files from UKBiobank and now I am planning to do GWAS in it. I plan to use Plink2.

can you please tell me which QC steps I would have to perform?

I was thinking to do these:

-remove related individuals
-remove non EUR
-remove SNPs with minor allele freq < 0.001
-model using ancestry info


Is there is some standard pipeline to do this in Plink2 or some related files from UKBiobank?

Thanks Ana

ukbiobank • 1.3k views
0
Entering edit mode

Entirely depends on the study. I would recommend thinking about what you're trying to accomplish and exploring the literature for meaningful filtering approaches depending on what you want to do.

0
Entering edit mode

Hi,

yes I agree, and I mentioned above those 4 QC steps I plan to do. My question is more how to do this in Plink2? or some other software?

For example to deal with MAF I would do this: plink2 --bgen ukb_imp_chr17_v3.bgen ref-first --sample ukb44316_imp_chr17_v3_s487317.sample --maf 0.001 --make-bpgen --out chr17

But I don't know how to deal with the rest of 3 QC steps. Also I should mentioned this is imputed data from UKBiobank.

Thanks Ana

0
Entering edit mode

Hi! Did you manage to solve this?

1
Entering edit mode
23 months ago
Sam ★ 4.2k

I have a rough Nextflow pipeline for this. You can find the scripts here

You can read the help message to see what file you need and you can read the script to see what actually did the script does.

You will also need the GreedyRelated program I wrote to run the script, which can be found here