Question: Extracting Sample Sizes From The Nhgri Gwas Catalog
gravatar for coleman_jonathan
5.6 years ago by
European Union
coleman_jonathan410 wrote:

Hi all,

I'm trying to generate a plot comparing the sample sizes of published GWAS with the number of associations each found with p <10^-8.

I've been using the NHGRI Catalog to obtain the relevant studies... identifying the significant findings is straightforward, but the sample sizes are contained in prose lines, with little consistency in their structure. For example, some will be listed as #cases,#controls, while others will say up to #individuals, etc. This means there is no obvious string separator to use to extract just the numbers.

Does anyone know of either a) a database of sample sizes for GWAS which lists the sample sizes numerically rather than as prose; or b) a way I can extract the sample sizes from the catalog (without manually going through several thousand papers...)?


gwas • 1.6k views
ADD COMMENTlink modified 5.6 years ago by Richard Smith400 • written 5.6 years ago by coleman_jonathan410

Hi! I'm very interested in this plot that you planned to draw. Did you get it? Would you mind to share it with me. I will definitely give you the credit. Thanks!

ADD REPLYlink written 2.5 years ago by lybird3000
gravatar for Richard Smith
5.6 years ago by
Richard Smith400
Cambridge, UK
Richard Smith400 wrote:

The HuGE GWAS Navigator I think includes all data from the NHGRI GWAS Catalog and other sources as well. There is a column in the file for sample size including initial and replicate where applicable, this is populated for a lot of the entries. The counts and populations are still in prose but I think are a bit more consistent so should be easier to parse. The disease/trait names from HuGE are certainly more consistent.

ADD COMMENTlink written 5.6 years ago by Richard Smith400
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1262 users visited in the last hour