Using Plink, I would like to calculate allele frequencies for a subset of individuals (cases) from a total cohort of 188 individuals (92 cases + 96 controls). I have proper .ped and .map files.
I have tried multiple options but I am not able to subset data. These are two typical ways I have tried:
./plink --file /path/in_data --chr 1-22 --allow-extra-chr --filter /path/cases.raw 1 --freq --make-bed --out out_data_cases
The .raw file has this format:
CAV-001 CAV-001 1 CAV-002 CAV-002 1 CAV-003 CAV-003 0 CAV-004 CAV-004 1
where the first column is Family ID, second column Individual ID, and in third column 1 are cases and 0 controls. I want to subset cases. All Family ID = Individual ID
./plink --file /path/in_data --chr 1-22 --allow-extra-chr --keep /path/cases.txt --freq --make-bed --out out_data_cases
The cases.txt file includes columns 1 and 2 from the .raw file.
This is what I get in the .log file (some paths are not shown):
16384 MB RAM detected; reserving 8192 MB for main workspace. .ped scan complete (for binary autoconversion). Performing single-pass .bed write (868263 variants, 188 people). .... 868263 variants loaded from .bim file. 188 people (0 males, 0 females, 188 ambiguous) loaded from .fam. Ambiguous sex IDs written to xxxx.nosex Using 1 thread (no multithreaded calculations invoked). Before main variant filters, 188 founders and 0 nonfounders present. Calculating allele frequencies... done. --freq: Allele frequencies (founders only) written to xxxx.frq 868263 variants and 188 people pass filters and QC. Note: No phenotypes present. --make-bed to ....
My question is similar as Cannot remove subjects from Plink files but I have tried what they suggest there, without positive outcome. Please help !