How to make small subset of large genotyping dataset with plink?
1
0
Entering edit mode
2.9 years ago
kynnjo ▴ 40

I have a collection of genotyping files, that, for the purpose of this question, I will call big.bed, big.ped, big.fam, big.map, big.vcf, etc. This dataset has information on ~1.8M SNPs and 877 samples.

I also have a list of ~1000 SNPs in a file wanted_snps.txt, one SNP per line.

I want to generate a collection of files tiny.bed, tiny.ped, tiny.fam, tiny.map, tiny.vcf consisting of the subsets of the data in the big.* files corresponding to the SNPs mentioned in wanted_snps.txt.

(In case it matters, we can safely assume that all the SNPs mentioned in wanted_snps.txt are represented in the big.* dataset.)

I understand that one can perform such subsetting using plink, but after poring over the online documentation, I still can't figure out how to do this.

Could someone show me what I commands I'd need to run to do this?

I am using plink version 1.9.

Thanks in advance!

SNP snp • 670 views
ADD COMMENT
1
Entering edit mode
2.9 years ago

Hello,

If I am not wrong, you can use the --extract command in plink to do this.

To extract only a subset of SNPs, it is possible to specify a list of required SNPs and make a new file, or perform an analysis on this subset, by using the command

plink --file data --extract mysnps.txt

where the file is just a list of SNPs, one per line, e.g.
snp005
snp008
snp101

http://zzz.bwh.harvard.edu/plink/dataman.shtml#extract

Hope this solves your query.

ADD COMMENT

Login before adding your answer.

Traffic: 1265 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6