Question: Incomplete GWAS Catalog Data from makeCurrentGwascat() [R, gwascat]
gravatar for anailis
13 days ago by
anailis0 wrote:

I want to query GWAS Catalog using the gwascat package in R. I was surprised to see makeCurrentGwasCat() returns only 6,427 associations when there are many more in GWAS Catalog. Is this what I am meant to be observing, or is something going wrong here?

> cat1 <- makeCurrentGwascat()
running read.delim on
formatting gwaswloc instance...
NOTE: input data had non-ASCII characters replaced by '*'.
Warning message:
In gwdf2GRanges(tab, extractDate = as.character(Sys.Date())) :
  NAs introduced by coercion
> cat1
gwasloc instance with 6427 records and 38 attributes per record.
Extracted:  2021-01-12 
Genome:  GRCh38 
GRanges object with 5 ranges and 3 metadata columns:
      seqnames    ranges strand |                 DISEASE/TRAIT        SNPS   P-VALUE
         <Rle> <IRanges>  <Rle> |                   <character> <character> <numeric>
  [1]       22  41151150      * | General risk tolerance (MTAG)  rs75843224     6e-14
  [2]        1 207861610      * | General risk tolerance (MTAG)    rs984983     6e-14
  [3]        2  59787624      * | General risk tolerance (MTAG)   rs6732097     6e-14
  [4]       12 102069362      * | General risk tolerance (MTAG)  rs17437668     9e-14
  [5]        6  26173250      * | General risk tolerance (MTAG)  rs34661691     9e-14
  seqinfo: 23 sequences from GRCh38 genome

Contrast this to the data that comes with the package from 2016 which has more associations:

gwasloc instance with 22714 records and 36 attributes per record.
Extracted:  2016-01-18 
Genome:  GRCh38 
GRanges object with 5 ranges and 3 metadata columns:
      seqnames    ranges strand |                  DISEASE/TRAIT        SNPS   P-VALUE
         <Rle> <IRanges>  <Rle> |                    <character> <character> <numeric>
  [1]       11  41798900      * | Post-traumatic stress disorder  rs10768747     5e-06
  [2]       15  34768262      * | Post-traumatic stress disorder  rs12232346     2e-06
  [3]        8  96500749      * | Post-traumatic stress disorder   rs2437772     6e-06
  [4]        9  98221544      * | Post-traumatic stress disorder   rs7866350     1e-06
  [5]       15  54423444      * | Post-traumatic stress disorder  rs73419609     6e-06
  seqinfo: 23 sequences from GRCh38 genome

My session info:

> sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] gwascat_2.18.0                          Homo.sapiens_1.3.1                      TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2                    
 [5] GO.db_3.10.0                            OrganismDbi_1.28.0                      GenomicFeatures_1.38.2                  GenomicRanges_1.38.0                   
 [9] GenomeInfoDb_1.22.1                     AnnotationDbi_1.48.0                    IRanges_2.20.2                          S4Vectors_0.24.4                       
[13] Biobase_2.46.0                          BiocGenerics_0.32.0                    

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5                  lattice_0.20-41             prettyunits_1.1.1           Rsamtools_2.2.3             Biostrings_2.54.0           assertthat_0.2.1           
 [7] digest_0.6.27               asreml_4.1.0.110            BiocFileCache_1.10.2        R6_2.5.0                    RSQLite_2.2.2               httr_1.4.2                 
[13] ggplot2_3.3.3               pillar_1.4.7                zlibbioc_1.32.0             rlang_0.4.10                progress_1.2.2              curl_4.3                   
[19] rstudioapi_0.13             data.table_1.13.6           blob_1.2.1                  Matrix_1.2-18               BiocParallel_1.20.1         stringr_1.4.0              
[25] RCurl_1.98-1.2              bit_4.0.4                   biomaRt_2.42.1              munsell_0.5.0               DelayedArray_0.12.3         compiler_3.6.2             
[31] rtracklayer_1.46.0          pkgconfig_2.0.3             askpass_1.1                 openssl_1.4.3               tidyselect_1.1.0            SummarizedExperiment_1.16.1
[37] tibble_3.0.4                GenomeInfoDbData_1.2.2      matrixStats_0.57.0          XML_3.99-0.3                crayon_1.3.4                dplyr_1.0.2                
[43] dbplyr_2.0.0                GenomicAlignments_1.22.1    bitops_1.0-6                rappdirs_0.3.1              RBGL_1.62.1                 grid_3.6.2                 
[49] gtable_0.3.0                lifecycle_0.2.0             DBI_1.1.0                   magrittr_2.0.1              scales_1.1.1                graph_1.64.0               
[55] stringi_1.5.3               XVector_0.26.0              ellipsis_0.3.1              generics_0.1.0              vctrs_0.3.6                 tools_3.6.2                
[61] bit64_4.0.5                 glue_1.4.2                  purrr_0.3.4                 hms_0.5.3                   colorspace_2.0-0            BiocManager_1.30.10        
[67] memoise_1.1.0

Thanks all.

ADD COMMENTlink modified 12 days ago • written 13 days ago by anailis0
gravatar for Zhilong Jia
12 days ago by
Zhilong Jia1.6k
Zhilong Jia1.6k wrote:

redo cat1 <- makeCurrentGwascat(). Probably due to the internet. cat is not a good variable name as there is a cat function in R.

Mine result :

ADD COMMENTlink modified 12 days ago • written 12 days ago by Zhilong Jia1.6k

Thanks for the advice on variable naming, I've updated my post.

I tried again this morning and can still only get the 6427 records. I wouldn't say that my internet is overly bad - it's not had problems downloading things before. Is your gwascat version the same as mine?

ADD REPLYlink written 12 days ago by anailis0

Package gwascat version 2.20.1

ADD REPLYlink written 12 days ago by Zhilong Jia1.6k

Updated R so I could updated to this version and now it works :) Thanks!

ADD REPLYlink written 12 days ago by anailis0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1825 users visited in the last hour