Question

GWAS data QC in PLINK

0

Entering edit mode

3.7 years ago

AR • 0

Hi everyone,

I am trying to do a GWAS study, where my pedigree info and marker data is coming from axiom array. I have .ped and .map plink compatible files. I converted them to binary .bed, .fam and .bim files for plink usage.

I have to perform QC on these files in plink before imputation. I want to separately perform QC on sample first and then marker QC. And most importantly, at each step of QC, I want to have a separate file. Not overwritten. I want to automate this pipeline as I have huge data. Is there any way I can do this and get separate files independent of one another for every step? In plink, there is an option "--script". For that as per I understood, we need to provide all filters together with specified thresholds and at the end it will give us one filtered file. Is there any other option other than this? Or one has to write python script for automating? Pardon me, I am not good at python.

Any help is highly appreciated.

Thanks

gwas plink QC imputation • 1.2k views

ADD COMMENT • link updated 3.7 years ago by Sam ★ 4.7k • written 3.7 years ago by AR • 0

score 1 · Answer 1 · 2020-08-08

The --write-snplist and --make-just-fam are quite helpful as they will generate new snp and family file without duplicating the genotype file (which is usually big). After completing the QC, you can then generate the required genotype file using --extract and --keep options. Usually we just run each command separately, or write a pipeline for it. I have an old pipeline here that I should update soon that does something similar to what you want. You are welcome to modify it to your need.