Question: Using Plink, get allele frequencies for a subset of individuals
0
gravatar for Mr Locuace
2.2 years ago by
Mr Locuace90
Chile
Mr Locuace90 wrote:

Using Plink, I would like to calculate allele frequencies for a subset of individuals (cases) from a total cohort of 188 individuals (92 cases + 96 controls). I have proper .ped and .map files.

I have tried multiple options but I am not able to subset data. These are two typical ways I have tried:

1)

./plink --file /path/in_data --chr 1-22 --allow-extra-chr --filter  /path/cases.raw 1 --freq --make-bed --out out_data_cases

The .raw file has this format:

CAV-001 CAV-001 1
CAV-002 CAV-002 1
CAV-003 CAV-003 0
CAV-004 CAV-004 1

where the first column is Family ID, second column Individual ID, and in third column 1 are cases and 0 controls. I want to subset cases. All Family ID = Individual ID

2)

./plink --file /path/in_data --chr 1-22 --allow-extra-chr --keep  /path/cases.txt --freq --make-bed --out out_data_cases

The cases.txt file includes columns 1 and 2 from the .raw file.

This is what I get in the .log file (some paths are not shown):

16384 MB RAM detected; reserving 8192 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (868263 variants, 188 people).
....
868263 variants loaded from .bim file.
188 people (0 males, 0 females, 188 ambiguous) loaded from .fam.
Ambiguous sex IDs written to xxxx.nosex
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 188 founders and 0 nonfounders present.
Calculating allele frequencies... done.
--freq: Allele frequencies (founders only) written to xxxx.frq
868263 variants and 188 people pass filters and QC.
Note: No phenotypes present.
--make-bed to ....

My question is similar as Cannot remove subjects from Plink files but I have tried what they suggest there, without positive outcome. Please help !

subset plink individuals • 2.1k views
ADD COMMENTlink modified 2.2 years ago by pfs270 • written 2.2 years ago by Mr Locuace90
1
gravatar for Kevin Blighe
2.2 years ago by
Kevin Blighe52k
Kevin Blighe52k wrote:

My guess is an encoding issue with your hyphens. PLINK may have converted these to something else when you read your data into PLINK initially - have you checked the sample names as they appear in PLINK?

Also check the formatting of the hyphen in your /path/cases.txt file. There are different encodings of hyphens in ASCII.

ADD COMMENTlink written 2.2 years ago by Kevin Blighe52k

Thank you @Kevin Blighe. Plink should recognize the hyphens form the individuals names of the .ped file, as I am able to get the allele frequencies for the whole cohort but not for the cases subset. I did a trial just doing copy & paste from a few individuals from the .fam file (and also from the .ped file) and I get the same output.

ADD REPLYlink written 2.2 years ago by Mr Locuace90

Yes, you're correct that it should accept hyphens.

I realise that I have never actually used --keep in PLINK. For sample filtering, I have always conducted it outside of PLINK when I have my data in BCF or VCF format, and I filter with a command like this:

bcftools view -Ob --samples-file KeepCases.list MyData.bcf > MyDataCases.bcf

bcftools index MyDataCases.bcf

I then convert these BCFs to PLINK format (> PLINK 1.9 required)

plink --noweb --bcf MyDataCases.bcf --keep-allele-order --indiv-sort file KeepCases.IDSort.list --vcf-idspace-to _ --const-fid --allow-extra-chr 0 --split-x b37 no-fail --make-bed --out MyDataCases

KeepCases.list is jst a single column file of sample IDs KeepCases.IDSort.list is the standard 2-column format, as per your own file. It instructs PLINK to read in the data and maintain the sample ordering as per KeepCases.IDSort.list

I don't know if this is an option for you or not!

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Kevin Blighe52k
1

Thank you @Kevin Blighe, I will try that. I have Plink 1.9

ADD REPLYlink written 2.2 years ago by Mr Locuace90
1
gravatar for Maxime Lamontagne
2.2 years ago by
Québec
Maxime Lamontagne2.2k wrote:

Create a file with only the FID and IID of the subjects you want to keep (NewFile.txt). Then try this :

./plink --file /path/in_data --chr 1-22 --allow-extra-chr --keep NewFile.txt --pheno /path/cases.txt --freq --make-bed --out out_data_case
ADD COMMENTlink written 2.2 years ago by Maxime Lamontagne2.2k

Thank you @Maxime Lamontagne. I got the same output, just a .log file and a .nosex file with the 188 individuals. In addition, I got an error of a supposedly Duplicate ID (I check the files and I don't have duplicates):

Error: Duplicate ID 'CAV-131 CAV-131'.

ADD REPLYlink written 2.2 years ago by Mr Locuace90

Try this :

./plink --file /path/in_data --chr 1-22 --allow-extra-chr --keep NewFile.txt --make-bed --out Temp1 --allow-no-sex

./plink --bfile Temp1 --pheno /path/cases.txt --freq --out out_data_case --allow-no-sex

ADD REPLYlink written 2.2 years ago by Maxime Lamontagne2.2k

Thank you very much @Maxime Lamontagne. Now it partially works. Somehow I get the allele frequencies for 5 individuals, but not for the 92 cases and my NewFile.txt has 92 individuals.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Mr Locuace90

I gues one of your file is incorrectly formatted. If you want, I could take a quick look at them.

ADD REPLYlink written 2.2 years ago by Maxime Lamontagne2.2k
0
gravatar for pfs
2.2 years ago by
pfs270
USA/Boston
pfs270 wrote:

Your -keep file format is wrong. Create one -keep file that only includes the people with a '1' identifier. Then run these two commands.

  1. ./plink --file /path/in_data --keep new_file.txt --freq --out control
  2. /plink --file /path/in_data --remove new_file.txt --freq --out case
ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by pfs270

Thank you @pfs, but I already mentioned that I did what you suggest in 2). Either you use --keep without identifier (.txt file with 2 columns) or you use --filter with identifier (.raw file with 3 columns, the third one has the identifier). Even if I use the --filter with only cases (1s), I get the same output.

ADD REPLYlink written 2.2 years ago by Mr Locuace90

Can you double check the path to the --keep file and make sure this file has the expected number of lines (wc -l keep_file). What version of Plink are you running?

ADD REPLYlink written 2.2 years ago by pfs270
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1842 users visited in the last hour