Question: R: Inconsistency between data from TCGAbiolinks and GDC (does TCGAbiolinks retrieve Legacy data by default?)
0
gravatar for fr
2.6 years ago by
fr100
fr100 wrote:

Hi!

I'm using R and TCGAbiolinks to retrieve data and clinical data from GDC. To do so I use the following:

library(TCGAbiolinks)
patientdownload<-function("TCGA-LIHC"){
  clinquery<-GDCquery(project = "TCGA-LIHC",data.category = "Clinical")
  GDCdownload(clinquery,chunks.per.download = 30)
  prepatientout<-GDCprepare_clinic(clinquery, clinical.info = "patient")

However, I am finding some iconsistencies between what I'm getting and what is in GDC. For instance, for subject with 'bcr_patient_barcode=TCGA-DD-AADB' I retrieve the following data from 'GDCquery'

    bcr_patient_barcode gender  race_list   vital_status    neoplasm_histologic_grade   stage_event_pathologic_stage
18  TCGA-DD-AADF    FEMALE  ASIAN   Dead    G4  Stage I

However, when you look at the subject's data in GDC (here) everything is in agreement, with exception for Grade, which is never reported.

Why?

Could this mean that 'neoplasm_histologic_grade' is not the tumor grade? Or that 'GDCquery' is retrieving some Legacy data?

EDIT: this was now crossposted at github

rna-seq R genome • 1.2k views
ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by fr100
0
gravatar for fr
2.6 years ago by
fr100
fr100 wrote:

An answer to this question was added at the GitHub of TCGAbiolinks. Full credit goes to tiagochst (who is also here in biostars but I can't tag him).

ADD COMMENTlink written 2.6 years ago by fr100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1519 users visited in the last hour