Extract Samples With Specific Rsid And Genotype Using Plink Or Similar Tools
1
3
Entering edit mode
8.8 years ago

I have a PLINK formatted database (bed/bim/fam files) and corresponding recoded files (hh/ped/map). I am looking for an effective way to extract samples from this database with specific genotypes. I have looked through the PLINK manual and found that I can extract set of samples using "--keep" parameter and extract set of genotypes using "--extract", am wondering if this can be done in a single step using another parameter or tool.

My input is a list of rsIDs and genotypes; I need to get sample ids and genotype as output. INPUT

rs1800562 AA


OUTPUT

Sample1 AA
Sample5 AA
Sample22 AA
...


Is there any option in PLINK to do this or I should use unix 'grep' and/or a custom script to extract data. Suggestions on other computational genomics tools to do similar task is also welcome.

plink genomics genotyping gwas • 5.5k views
4
Entering edit mode
8.7 years ago
Stephen 2.8k

You can combine both --keep and --extract in a single step, but you're wanting to condition your --keep based on the genotypes you get from your --extract, which PLINK can't do to my knowledge. If you want a single ped file for each snp you could do something like

awk '{print \$1}' INPUT > mysnps
plink --bfile data --extract mysnps --tfile mysnps
(some code here to loop through each line of mysnps.tped and pulling out column index when your genotype matches, and write out a list of samples for each snp)
(some code here to run plink --keep for each list of samples)


... but you probably already knew this, and just need an implementation. Sorry this wasn't much help.

0
Entering edit mode

Thanks a lot Stephen. I worked out a solution based on your suggestion - tped was the hat-tip :). Please see if you can add this as an answer for future reference.