Random Subset of Individuals from .BED File (.ped not available)
0
0
Entering edit mode
3.0 years ago
angus.gane • 0

I am trying to split a a GWAS cohort into two random samples. I have the .bed, .fam, and .bim files. I know plink has commands for filtering out subsets of individuals (--filter) but this seems to require the .map file. It is possible to filter binary files on plink but it doesn't seem to allow this for the first two 'columns' - which contain the individual data I need to filter using.

My very computationally intensive solution has been to recode the .bed file and .ped and .map files for each chromosome (800GB+), randomly select a cohort of individuals with shuf and then grep these out of the .ped file before recoding as .bed files.

I was wondering if anyone had a better way of doing this?

Thanks, Angus

plink GWAS • 913 views
ADD COMMENT
0
Entering edit mode

Are you doing this for some 'machine learning' or bootstrapping method?, i.e., breaking the dataset up into training and testing?

Just do the following:

  1. obtain a sample ID listitng
  2. 'randomly' select sample IDs from the listing (using any programming language)
  3. use --keep or --remove on your BED files to keep or remove samples accordingly
ADD REPLY

Login before adding your answer.

Traffic: 2272 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6