Hi all,
I'm trying to generate a plot comparing the sample sizes of published GWAS with the number of associations each found with p <10^-8.
I've been using the NHGRI Catalog to obtain the relevant studies... identifying the significant findings is straightforward, but the sample sizes are contained in prose lines, with little consistency in their structure. For example, some will be listed as #cases,#controls, while others will say up to #individuals, etc. This means there is no obvious string separator to use to extract just the numbers.
Does anyone know of either a) a database of sample sizes for GWAS which lists the sample sizes numerically rather than as prose; or b) a way I can extract the sample sizes from the catalog (without manually going through several thousand papers...)?
Thanks!
Hi! I'm very interested in this plot that you planned to draw. Did you get it? Would you mind to share it with me. I will definitely give you the credit. Thanks!