Random Subset of Individuals from .BED File (.ped not available)
Entering edit mode
3.0 years ago
angus.gane • 0

I am trying to split a a GWAS cohort into two random samples. I have the .bed, .fam, and .bim files. I know plink has commands for filtering out subsets of individuals (--filter) but this seems to require the .map file. It is possible to filter binary files on plink but it doesn't seem to allow this for the first two 'columns' - which contain the individual data I need to filter using.

My very computationally intensive solution has been to recode the .bed file and .ped and .map files for each chromosome (800GB+), randomly select a cohort of individuals with shuf and then grep these out of the .ped file before recoding as .bed files.

I was wondering if anyone had a better way of doing this?

Thanks, Angus

plink GWAS • 913 views
Entering edit mode

Are you doing this for some 'machine learning' or bootstrapping method?, i.e., breaking the dataset up into training and testing?

Just do the following:

  1. obtain a sample ID listitng
  2. 'randomly' select sample IDs from the listing (using any programming language)
  3. use --keep or --remove on your BED files to keep or remove samples accordingly

Login before adding your answer.

Traffic: 2272 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6