Question: What software to use to asses phenotypic information from ULBiobank dataset?
0
gravatar for anamaria
12 days ago by
anamaria60
anamaria60 wrote:

Hello,

I got my data from UKbiobank, for 502536 subjects. I would like to determine which subjects have diabetic related complication in order to distinguish cases and controls and perform GWAS on that data.

Right now I can load my data in R:

library(ukbtools)
my_ukb_data <- ukb_df("ukb31212")

and to find ICD10 code names I can use this:

ukb_icd_keyword("diabetes", icd.version = 10)

and I get about 20 listed codes and their explanations. And the for example for E13 code:

> ukb_icd_prevalence(my_ukb_data, icd.version = 10, icd.diagnosis = "E13")
Error in ukb_icd_prevalence(my_ukb_data, icd.version = 10, icd.diagnosis = "E13") : 
  unused argument (icd.diagnosis = "E13")

Is this issue with the software ukbtools? or there are no subjects in my dataset associated with this E13? Do you have any other software to recommend for exploring/assessing diabetic complications from UKBiobank data?

Thanks

ukbiobank • 144 views
ADD COMMENTlink modified 11 days ago by ken.hanscombe10 • written 12 days ago by anamaria60

The UKB supplied programs (in particular ukbconv, https://biobank.ctsu.ox.ac.uk/crystal/download.cgi) allow you to decrypt and convert the data to any format your prefer. You are free to use R, Python, STATA or whatever statistical software you are most comfortable with to analyse the data.

I wrote the R package ukbtools https://kenhanscombe.github.io/ukbtools/index.html to remove the upfront data wrangling required to marry the separate pieces of data into a single dataframe and begin analysis. It includes functionality to query disease diagnoses and demographics. It is fully documented here https://kenhanscombe.github.io/ukbtools/reference/index.html

ADD REPLYlink written 11 days ago by ken.hanscombe10
1
gravatar for Kevin Blighe
12 days ago by
Kevin Blighe45k
Kevin Blighe45k wrote:

Hey, you are not using the function correctly. Please review the correct syntax, here: https://www.rdocumentation.org/packages/ukbtools/versions/0.11.3/topics/ukb_icd_prevalence

Kevin

ADD COMMENTlink written 12 days ago by Kevin Blighe45k

Hi Kevin,

thanks I will try it that way. I was following these instructions: https://cran.r-project.org/web/packages/ukbtools/vignettes/explore-ukb-data.html

BDW do you know how I would assess which phenotypes in my data are related to E13 ICD10 code? What would be command for that?

ADD REPLYlink written 11 days ago by anamaria60

also this doesn't give me anything, and the same is for a few other diabetes codes I tried

ukb_icd_prevalence(my_ukb_data, icd.code = "P70") [1] NaN ukb_icd_prevalence(my_ukb_data, icd.code = "P70.2") [1] NaN ukb_icd_prevalence(my_ukb_data, icd.code = "E13") [1] NaN dim(my_ukb_data) [1] 502536 131

ADD REPLYlink written 11 days ago by anamaria60
1
gravatar for ken.hanscombe
11 days ago by
ken.hanscombe10 wrote:

All users of ukbtools benefit if there is a track record of issues raised. I raised an issue for you with one of your first email requests and it is still open https://github.com/kenhanscombe/ukbtools/issues/20


ukb_icd_keyword("diabetes", icd.version = 10) is working exactly as described in the documentation https://kenhanscombe.github.io/ukbtools/reference/index.html. It returns all ICD descriptions including the search term supplied.

NB. ukb_icd_keyword and ukb_icd_code_meaning query ICD tables supplied as datasets (icd10chapters, icd10codes, icd9chapters, icd9codes) with the package, and described in the documentation https://kenhanscombe.github.io/ukbtools/reference/index.html#section-datasets


A lot of your subsequent issues look like typos and/or incorrect use of the functionality.

ukb_icd_prevalence has no argument icd.diagnosis (which is what the generic R error is telling you). You need to read the documentation more carefully.


icd.code = "P70" and icd.code = "E13" work fine for me. icd.code = "P70.2" is not valid: no ICD codes in UKB data include a decimal point. Look at the data. Try icd.code = "P702".

What error are you getting exactly (for the valid codes)? Are you sure you have hospital episode statistics data ("diagnoses") in your UKB data?

NB. The argument to icd.code is a regular expression (as described in the documentation). To understand which codes you're requesting the frequency of, you can do a regex search on the supplied icd10codes dataset, e.g., filter(icd10codes, str_detect(code, "E13")). If you want the prevalence of a specific code, e.g. E13.2 With renal complications, it is safest to us icd.code = "^E13.2$".

ADD COMMENTlink written 11 days ago by ken.hanscombe10

Hi Ken,

thank you for those clarifications!

So what I am trying to use your code for is this: to relate my selected ICD10 codes (say H360) with selecting phenotypes and measurements given in my data file.

For example how would I identify/extract these 4235 individuals mentioned and the bellow page and defined them as my cases.

http://biobank.ctsu.ox.ac.uk/showcase/field.cgi?id=6148

Also I tried using:

> filter(icd10codes, str_detect(code, "E13"))
Error in stri_detect_regex(string, pattern, opts_regex = opts(pattern)) : 
  object 'code' not found
In addition: Warning messages:
1: In data.matrix(data) : NAs introduced by coercion
2: In data.matrix(data) : NAs introduced by coercion

what should be given there instead of "code" ?

files which I have available are these:

> list.files()
 [1] "archive.tar.gz"                "encoding.ukb"                 
 [3] "fields.ukb"                    "HESDataDic.xlsx"              
 [5] "HESTables.xlsx"                "HospitalEpisodeStatistics.pdf"
 [7] "k44316.key"                    "ukb31212.csv"                 
 [9] "ukb31212.enc"                  "ukb31212.enc_ukb"             
[11] "ukb31212.html"                 "ukb31212.log"                 
[13] "ukb31212.r"                    "ukb31212.tab"                 
[15] "ukbconv"                       "ukbgene"                      
[17] "ukbmd5"                        "ukbunpack"

I downloaded HES file from here: https://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=2000

Please let me know if those are applicable HES files you are referring to?

Thank you for your help! Ana

ADD REPLYlink written 11 days ago by anamaria60

also is there is workaround this error that I am getting:

> my_ukb_data[1:3,1:3]
      eid sex_f31_0_0 year_of_birth_f34_0_0
1 1000017      Female                  1938
2 1000025      Female                  1951
3 1000038        Male                  1961

> ukb_icd_diagnosis(my_ukb_data, id = "1000017", icd.version = 10)
Error: Column 1 must be named.
Use .name_repair to specify repair.
Call `rlang::last_error()` to see a backtrace
ADD REPLYlink written 11 days ago by anamaria60

Also this command gives me always NaN, and I tried for multiple codes

     > ukb_icd_prevalence(my_ukb_data, icd.code = "H360")
      [1] NaN
ADD REPLYlink written 10 days ago by anamaria60

Hi Ken,

can you please explain what did you mean with:

filter(icd10codes, str_detect(code, "E13"))

what is code in this example?

What I want to do is to extract from my dataset cases which comply with these 2 definitions:

Data-Field 41270: E10.3,E11.3,E14.3,H360 + Data-Field 6148: Diabetes related eye disease (in questionnaire they answered Yes)

How do I do that using your code?

Thanks Ana

ADD REPLYlink written 5 days ago by anamaria60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1059 users visited in the last hour