Hello,
I am trying to use PLINK (v1.07) to input dosage data and output a subset of that data without analyzing it. That is, I want to keep all of the SNPs but only a subset of the samples. Is this possible with PLINK? I have been using the --dosage and --write-dosage options, but I have not had any success. Details Below.
I have imputed dosage data for 5 samples in a file called test.dose that looks like this:
SNP A1 A2 F1 D1-40 F2 D2-32 F3 D3-30 F4 D4-49 F5 D5-30
7:16719 A B 1.99800005159341 1.99599998397753 1.99800005159341 1.99800005159341 1.99599998397753
7:31273 A B 1.55099993944168 1.92400002479553 1.91199994832277 1.9119999781251 1.94999995082617
...
I have a test.fam file that looks like this:
F1 D1-40 -9 -9 2 -9
F2 D2-32 -9 -9 2 -9
F3 D3-30 -9 -9 2 -9
F4 D4-49 -9 -9 2 -9
F5 D5-30 -9 -9 2 -9
I have a list.txt file that contains the following:
F2 D2-32
F5 D5-30
I am running the following PLINK command:
plink --dosage test.dose format=1 --fam test.fam --keep list.txt --noweb --write-dosage --out subset
This outputs a file called subset.out.dosage, but it doesn't contain the dosages. It looks like this:
SNP A1 A2
7:16719 A B
7:31273 A B
...
What I would like is the above file but with dosages for the samples contained in list.txt. I realize that there are many tools for manipulating text, but is this possible with PLINK?
Didn't test it, but GenGen might do it, see
combine_snptest.pl file1 -keep caseid.keep -prefix caseonly
. Why do you need to subset?