Extracting Sample Sizes From The Nhgri Gwas Catalog
1
1
Entering edit mode
10.8 years ago

Hi all,

I'm trying to generate a plot comparing the sample sizes of published GWAS with the number of associations each found with p <10^-8.

I've been using the NHGRI Catalog to obtain the relevant studies... identifying the significant findings is straightforward, but the sample sizes are contained in prose lines, with little consistency in their structure. For example, some will be listed as #cases,#controls, while others will say up to #individuals, etc. This means there is no obvious string separator to use to extract just the numbers.

Does anyone know of either a) a database of sample sizes for GWAS which lists the sample sizes numerically rather than as prose; or b) a way I can extract the sample sizes from the catalog (without manually going through several thousand papers...)?

Thanks!

gwas • 2.5k views
ADD COMMENT
0
Entering edit mode

Hi! I'm very interested in this plot that you planned to draw. Did you get it? Would you mind to share it with me. I will definitely give you the credit. Thanks!

ADD REPLY
0
Entering edit mode
10.8 years ago
Richard Smith ▴ 400

The HuGE GWAS Navigator I think includes all data from the NHGRI GWAS Catalog and other sources as well. There is a column in the file for sample size including initial and replicate where applicable, this is populated for a lot of the entries. The counts and populations are still in prose but I think are a bit more consistent so should be easier to parse. The disease/trait names from HuGE are certainly more consistent.

ADD COMMENT

Login before adding your answer.

Traffic: 3001 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6