Incomplete GWAS Catalog Data from makeCurrentGwascat() [R, gwascat]
1
0
Entering edit mode
4 months ago
anailis • 0

I want to query GWAS Catalog using the gwascat package in R. I was surprised to see makeCurrentGwasCat() returns only 6,427 associations when there are many more in GWAS Catalog. Is this what I am meant to be observing, or is something going wrong here?

> cat1 <- makeCurrentGwascat()
running read.delim on http://www.ebi.ac.uk/gwas/api/search/downloads/alternative...
formatting gwaswloc instance...
NOTE: input data had non-ASCII characters replaced by '*'.
Warning message:
In gwdf2GRanges(tab, extractDate = as.character(Sys.Date())) :
  NAs introduced by coercion
> cat1
gwasloc instance with 6427 records and 38 attributes per record.
Extracted:  2021-01-12 
Genome:  GRCh38 
Excerpt:
GRanges object with 5 ranges and 3 metadata columns:
      seqnames    ranges strand |                 DISEASE/TRAIT        SNPS   P-VALUE
         <Rle> <IRanges>  <Rle> |                   <character> <character> <numeric>
  [1]       22  41151150      * | General risk tolerance (MTAG)  rs75843224     6e-14
  [2]        1 207861610      * | General risk tolerance (MTAG)    rs984983     6e-14
  [3]        2  59787624      * | General risk tolerance (MTAG)   rs6732097     6e-14
  [4]       12 102069362      * | General risk tolerance (MTAG)  rs17437668     9e-14
  [5]        6  26173250      * | General risk tolerance (MTAG)  rs34661691     9e-14
  -------
  seqinfo: 23 sequences from GRCh38 genome

Contrast this to the data that comes with the package from 2016 which has more associations:

data(ebicat38)
ebicat38
gwasloc instance with 22714 records and 36 attributes per record.
Extracted:  2016-01-18 
Genome:  GRCh38 
Excerpt:
GRanges object with 5 ranges and 3 metadata columns:
      seqnames    ranges strand |                  DISEASE/TRAIT        SNPS   P-VALUE
         <Rle> <IRanges>  <Rle> |                    <character> <character> <numeric>
  [1]       11  41798900      * | Post-traumatic stress disorder  rs10768747     5e-06
  [2]       15  34768262      * | Post-traumatic stress disorder  rs12232346     2e-06
  [3]        8  96500749      * | Post-traumatic stress disorder   rs2437772     6e-06
  [4]        9  98221544      * | Post-traumatic stress disorder   rs7866350     1e-06
  [5]       15  54423444      * | Post-traumatic stress disorder  rs73419609     6e-06
  -------
  seqinfo: 23 sequences from GRCh38 genome

My session info:

> sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] gwascat_2.18.0                          Homo.sapiens_1.3.1                      TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 org.Hs.eg.db_3.10.0                    
 [5] GO.db_3.10.0                            OrganismDbi_1.28.0                      GenomicFeatures_1.38.2                  GenomicRanges_1.38.0                   
 [9] GenomeInfoDb_1.22.1                     AnnotationDbi_1.48.0                    IRanges_2.20.2                          S4Vectors_0.24.4                       
[13] Biobase_2.46.0                          BiocGenerics_0.32.0                    

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5                  lattice_0.20-41             prettyunits_1.1.1           Rsamtools_2.2.3             Biostrings_2.54.0           assertthat_0.2.1           
 [7] digest_0.6.27               asreml_4.1.0.110            BiocFileCache_1.10.2        R6_2.5.0                    RSQLite_2.2.2               httr_1.4.2                 
[13] ggplot2_3.3.3               pillar_1.4.7                zlibbioc_1.32.0             rlang_0.4.10                progress_1.2.2              curl_4.3                   
[19] rstudioapi_0.13             data.table_1.13.6           blob_1.2.1                  Matrix_1.2-18               BiocParallel_1.20.1         stringr_1.4.0              
[25] RCurl_1.98-1.2              bit_4.0.4                   biomaRt_2.42.1              munsell_0.5.0               DelayedArray_0.12.3         compiler_3.6.2             
[31] rtracklayer_1.46.0          pkgconfig_2.0.3             askpass_1.1                 openssl_1.4.3               tidyselect_1.1.0            SummarizedExperiment_1.16.1
[37] tibble_3.0.4                GenomeInfoDbData_1.2.2      matrixStats_0.57.0          XML_3.99-0.3                crayon_1.3.4                dplyr_1.0.2                
[43] dbplyr_2.0.0                GenomicAlignments_1.22.1    bitops_1.0-6                rappdirs_0.3.1              RBGL_1.62.1                 grid_3.6.2                 
[49] gtable_0.3.0                lifecycle_0.2.0             DBI_1.1.0                   magrittr_2.0.1              scales_1.1.1                graph_1.64.0               
[55] stringi_1.5.3               XVector_0.26.0              ellipsis_0.3.1              generics_0.1.0              vctrs_0.3.6                 tools_3.6.2                
[61] bit64_4.0.5                 glue_1.4.2                  purrr_0.3.4                 hms_0.5.3                   colorspace_2.0-0            BiocManager_1.30.10        
[67] memoise_1.1.0

Thanks all.

bioconductor gwascat gwas catalog r • 189 views
ADD COMMENT
1
Entering edit mode
4 months ago
Zhilong Jia ★ 1.8k

redo cat1 <- makeCurrentGwascat(). Probably due to the internet. cat is not a good variable name as there is a cat function in R.

Mine result :

ADD COMMENT
0
Entering edit mode

Thanks for the advice on variable naming, I've updated my post.

I tried again this morning and can still only get the 6427 records. I wouldn't say that my internet is overly bad - it's not had problems downloading things before. Is your gwascat version the same as mine?

ADD REPLY
0
Entering edit mode

Package gwascat version 2.20.1

ADD REPLY
0
Entering edit mode

Updated R so I could updated to this version and now it works :) Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 1593 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6