Using Plink, I would like to calculate allele frequencies for a subset of individuals (cases) from a total cohort of 188 individuals (92 cases + 96 controls). I have proper .ped and .map files.
I have tried multiple options but I am not able to subset data. These are two typical ways I have tried:
1)
./plink --file /path/in_data --chr 1-22 --allow-extra-chr --filter /path/cases.raw 1 --freq --make-bed --out out_data_cases
The .raw file has this format:
CAV-001 CAV-001 1
CAV-002 CAV-002 1
CAV-003 CAV-003 0
CAV-004 CAV-004 1
where the first column is Family ID, second column Individual ID, and in third column 1 are cases and 0 controls. I want to subset cases. All Family ID = Individual ID
2)
./plink --file /path/in_data --chr 1-22 --allow-extra-chr --keep /path/cases.txt --freq --make-bed --out out_data_cases
The cases.txt file includes columns 1 and 2 from the .raw file.
This is what I get in the .log file (some paths are not shown):
16384 MB RAM detected; reserving 8192 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (868263 variants, 188 people).
....
868263 variants loaded from .bim file.
188 people (0 males, 0 females, 188 ambiguous) loaded from .fam.
Ambiguous sex IDs written to xxxx.nosex
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 188 founders and 0 nonfounders present.
Calculating allele frequencies... done.
--freq: Allele frequencies (founders only) written to xxxx.frq
868263 variants and 188 people pass filters and QC.
Note: No phenotypes present.
--make-bed to ....
My question is similar as Cannot remove subjects from Plink files but I have tried what they suggest there, without positive outcome. Please help !