I want to query GWAS Catalog using the gwascat package in R. I was surprised to see makeCurrentGwasCat() returns only 6,427 associations when there are many more in GWAS Catalog. Is this what I am meant to be observing, or is something going wrong here?

> cat1 <- makeCurrentGwascat()
running read.delim on
formatting gwaswloc instance...
NOTE: input data had non-ASCII characters replaced by '*'.
Warning message:
In gwdf2GRanges(tab, extractDate = as.character(Sys.Date())) :
  NAs introduced by coercion
> cat1
gwasloc instance with 6427 records and 38 attributes per record.
Extracted:  2021-01-12 
Genome:  GRCh38 
GRanges object with 5 ranges and 3 metadata columns:
      seqnames    ranges strand |                 DISEASE/TRAIT        SNPS   P-VALUE
         <Rle> <IRanges>  <Rle> |                   <character> <character> <numeric>
  [1]       22  41151150      * | General risk tolerance (MTAG)  rs75843224     6e-14
  [2]        1 207861610      * | General risk tolerance (MTAG)    rs984983     6e-14
  [3]        2  59787624      * | General risk tolerance (MTAG)   rs6732097     6e-14
  [4]       12 102069362      * | General risk tolerance (MTAG)  rs17437668     9e-14
  [5]        6  26173250      * | General risk tolerance (MTAG)  rs34661691     9e-14
  seqinfo: 23 sequences from GRCh38 genome

Contrast this to the data that comes with the package from 2016 which has more associations:

gwasloc instance with 22714 records and 36 attributes per record.
Extracted:  2016-01-18 
Genome:  GRCh38 
GRanges object with 5 ranges and 3 metadata columns:
      seqnames    ranges strand |                  DISEASE/TRAIT        SNPS   P-VALUE
         <Rle> <IRanges>  <Rle> |                    <character> <character> <numeric>
  [1]       11  41798900      * | Post-traumatic stress disorder  rs10768747     5e-06
  [2]       15  34768262      * | Post-traumatic stress disorder  rs12232346     2e-06
  [3]        8  96500749      * | Post-traumatic stress disorder   rs2437772     6e-06
  [4]        9  98221544      * | Post-traumatic stress disorder   rs7866350     1e-06
  [5]       15  54423444      * | Post-traumatic stress disorder  rs73419609     6e-06
  seqinfo: 23 sequences from GRCh38 genome

My session info:

> sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] gwascat_2.18.0                          Homo.sapiens_1.3.1                      TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2                    
 [5] GO.db_3.10.0                            OrganismDbi_1.28.0                      GenomicFeatures_1.38.2                  GenomicRanges_1.38.0                   
 [9] GenomeInfoDb_1.22.1                     AnnotationDbi_1.48.0                    IRanges_2.20.2                          S4Vectors_0.24.4                       
[13] Biobase_2.46.0                          BiocGenerics_0.32.0                    

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5                  lattice_0.20-41             prettyunits_1.1.1           Rsamtools_2.2.3             Biostrings_2.54.0           assertthat_0.2.1           
 [7] digest_0.6.27               asreml_4.1.0.110            BiocFileCache_1.10.2        R6_2.5.0                    RSQLite_2.2.2               httr_1.4.2                 
[13] ggplot2_3.3.3               pillar_1.4.7                zlibbioc_1.32.0             rlang_0.4.10                progress_1.2.2              curl_4.3                   
[19] rstudioapi_0.13             data.table_1.13.6           blob_1.2.1                  Matrix_1.2-18               BiocParallel_1.20.1         stringr_1.4.0              
[25] RCurl_1.98-1.2              bit_4.0.4                   biomaRt_2.42.1              munsell_0.5.0               DelayedArray_0.12.3         compiler_3.6.2             
[31] rtracklayer_1.46.0          pkgconfig_2.0.3             askpass_1.1                 openssl_1.4.3               tidyselect_1.1.0            SummarizedExperiment_1.16.1
[37] tibble_3.0.4                GenomeInfoDbData_1.2.2      matrixStats_0.57.0          XML_3.99-0.3                crayon_1.3.4                dplyr_1.0.2                
[43] dbplyr_2.0.0                GenomicAlignments_1.22.1    bitops_1.0-6                rappdirs_0.3.1              RBGL_1.62.1                 grid_3.6.2                 
[49] gtable_0.3.0                lifecycle_0.2.0             DBI_1.1.0                   magrittr_2.0.1              scales_1.1.1                graph_1.64.0               
[55] stringi_1.5.3               XVector_0.26.0              ellipsis_0.3.1              generics_0.1.0              vctrs_0.3.6                 tools_3.6.2                
[61] bit64_4.0.5                 glue_1.4.2                  purrr_0.3.4                 hms_0.5.3                   colorspace_2.0-0            BiocManager_1.30.10        
[67] memoise_1.1.0

Thanks all.

redo cat1 <- makeCurrentGwascat(). Probably due to the internet. cat is not a good variable name as there is a cat function in R.

Mine result :

Thanks for the advice on variable naming, I've updated my post.

I tried again this morning and can still only get the 6427 records. I wouldn't say that my internet is overly bad - it's not had problems downloading things before. Is your gwascat version the same as mine?

Package gwascat version 2.20.1

Updated R so I could updated to this version and now it works :) Thanks!


