Question: How To Extract Random Snps From Whole Genome Data?
2
gravatar for User 1793
7.5 years ago by
User 179340
User 179340 wrote:

Hi All,

I have a dataset containing more than 500K SNPs. Now I need to extract randome 15K SNPs from that. Please help me to do so.

Thanks

extraction snp • 6.7k views
ADD COMMENTlink written 7.5 years ago by User 179340
5
gravatar for Maxime Lamontagne
7.5 years ago by
Québec
Maxime Lamontagne2.1k wrote:

Transform your file in PED format

plink --bfile file1 --recode --out file2

Extract snps column

cut -f 2 file2.map > snps.map

Choose 15k SNPs

shuf -n 15000 snps.map > snps.subset.map

Extract those SNPs from your first file

plink --bfile file1 --extract snps.subset.map --make-bed --out file3

ADD COMMENTlink written 7.5 years ago by Maxime Lamontagne2.1k

perfect! Thanks a lot.

ADD REPLYlink written 7.5 years ago by User 179340

extra step of making a plain text PED file. And not all unix systems have shuf installed. sort -R on the BIM file is all you need.

ADD REPLYlink written 7.5 years ago by Caddymob950
4
gravatar for Pablo
7.5 years ago by
Pablo1.9k
Canada
Pablo1.9k wrote:

I think the unix command 'shuf' does the trick (assuming the SNPs are one per line in a text/VCF file)

shuf -n 15000 snps_file.vcf

ADD COMMENTlink written 7.5 years ago by Pablo1.9k

thanks, I have the file in .bed, .bim, .fam format!

ADD REPLYlink written 7.5 years ago by User 179340

use PLINK to create a VCF file and follow Pablo's suggestion. Or use the PLINK R interface to do the same.

ADD REPLYlink written 7.5 years ago by Aaronquinlan11k

plink --bfile file1 --recode --out file2

cut -f 2 file2.map > snps.map

shuf -n 15000 snps.map > snps.subset.map

plink --bfile file1 --extract snps.subset.map --make-bed --out file3

ADD REPLYlink written 7.5 years ago by Maxime Lamontagne2.1k

plink --bfile file1 --recode --out file2

cut -f 2 file2.map > snps.map

shuf -n 15000 snps.map > snps.subset.map

plink --bfile file1 --extract snps.subset.map --make-bed --out file3

ADD REPLYlink written 7.5 years ago by Maxime Lamontagne2.1k

the command "shuf" is not found when I try to run this on terminal in OSX.

ADD REPLYlink written 4.8 years ago by Scott80
2
gravatar for Caddymob
7.5 years ago by
Caddymob950
United States
Caddymob950 wrote:

you can also just use the UNIX sort to randomly grab lines out of your BIM file...

sort -R yourdata.bim | head -15000 | awk '{print$2}' > random15k.snps
plink --file yourdata --extract random15k.snps --make-bed --out random15k

this avoids the time and disk space to convert your file to a plain-text PED file and keeps it all binary for speed and disk friendliness =)

ADD COMMENTlink modified 7.5 years ago • written 7.5 years ago by Caddymob950

thanks. this also worked perfectly.

ADD REPLYlink written 7.5 years ago by User 179340
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 891 users visited in the last hour