Question: Random Subset of Individuals from .BED File (.ped not available)
gravatar for angus.gane
2.3 years ago by
angus.gane0 wrote:

I am trying to split a a GWAS cohort into two random samples. I have the .bed, .fam, and .bim files. I know plink has commands for filtering out subsets of individuals (--filter) but this seems to require the .map file. It is possible to filter binary files on plink but it doesn't seem to allow this for the first two 'columns' - which contain the individual data I need to filter using.

My very computationally intensive solution has been to recode the .bed file and .ped and .map files for each chromosome (800GB+), randomly select a cohort of individuals with shuf and then grep these out of the .ped file before recoding as .bed files.

I was wondering if anyone had a better way of doing this?

Thanks, Angus

plink gwas • 686 views
ADD COMMENTlink written 2.3 years ago by angus.gane0

Are you doing this for some 'machine learning' or bootstrapping method?, i.e., breaking the dataset up into training and testing?

Just do the following:

  1. obtain a sample ID listitng
  2. 'randomly' select sample IDs from the listing (using any programming language)
  3. use --keep or --remove on your BED files to keep or remove samples accordingly
ADD REPLYlink written 2.3 years ago by Kevin Blighe61k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1154 users visited in the last hour