How to Select A Given Subset of SNPs from the 1000GP Dataset?
1
0
Entering edit mode
4.5 years ago
chiaoyu • 0

Dear Biostar Experts,

I am currently working on some SNPs dataset where I need to use the individual data for LD pruning. Since the original dataset doesn't come with individual data, I was told to use 1000GP's individual data to do the pruning. However, the dataset we are interested in contains around 2 millions SNPs, which is far less than that in 1000GP, so we need to find the SNPs in the intersection of these 2 million SNPs and those in the 1000GP. I wonder if there's a way to download only this intersection's data from 1000GP, or if that's not viable and we need to download the whole 1000GP, how we can select the subset in our machine.

Thanks a lot for any help!

Best, Chiao-Yu

SNP • 702 views
ADD COMMENT
0
Entering edit mode
4.5 years ago

The entire 1000GP phase 3 dataset is only a ~3.6 GiB download from https://www.cog-genomics.org/plink/2.0/resources , or ~3 GiB if you don't need the VCF annotations. From there, it should not be too difficult to intersect with your dataset.

ADD COMMENT

Login before adding your answer.

Traffic: 2052 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6