TCGA biolinks: peripheral blood healthy control
1
0
Entering edit mode
3.9 years ago
Oli • 0

Hey community, i hope this questiong isn't off topic in here. I have downloaded the Acute Myeloid Leukemia RNA seq raw counts data from TCGA with the TCGAbiolinks package using the followin code

library(TCGAbiolinks)
query <- GDCquery(project = "TCGA-LAML",
                  data.category = "Transcriptome Profiling",
                  data.type = "Gene Expression Quantification", 
                  workflow.type = "HTSeq - Counts",
                  sample.type = "Primary Blood Derived Cancer - Peripheral Blood")

Followed by the download and prepare functions. However, I don't understand how to retrieve the same type of data from healthy blood to use it as a control for DEG analysis. If I run the same script with sample.type = "Blood Derived Normal" i get an error message stating there is no result matching my query. Anyone can help me out?

Thanks in advance!

R RNA-Seq • 1.7k views
ADD COMMENT
2
Entering edit mode
3.9 years ago
bruce.moran ▴ 960

See the TCGAquery_SampleTypes function

This has argument barcode and typesample (which is what you enter in the sample.type parameter above).

If you have all samples downloaded/prepared:

query <- GDCquery(project = "TCGA-LAML",
              data.category = "Transcriptome Profiling",
              data.type = "Gene Expression Quantification", 
              workflow.type = "HTSeq - Counts")
GDCdownload(query, method = "api", files.per.chunk = 100)
laml <- GDCprepare(query)

then this will return the barcodes of all sample in one of the normal categories:

TCGAquery_SampleTypes(barcode=laml$barcode, typesample=c("NBM", "NEBV", "NBC", "NB")

NB that there are no normals available for LAML using legacy = FALSE, cannot test on legacy = TRUE currently.

ADD COMMENT
0
Entering edit mode

thank you for you help, but I'm not able to solve the problem. The returning message is always that there is no matching result even when I do the query for the TARGET-AML project. Moreover, the second code expects something else and does not return the sample barcode.

ADD REPLY
0
Entering edit mode

OK, you asked the question about TCGA-LAML, but now you want info on TARGET-AML, those are different datasets obviously so methods to interrogate them don't necessarily transfer between them.

What version of TCGAbiolinks are you using? I had issues recently and upgraded to TCGAbiolinks_2.17.1 using BiocManager:::install("BioinformaticsFMRP/TCGAbiolinks").

Post your error message if you're getting one.

FWIW I used same code to download TARGET-AML, table(laml@colData$sample_type)does not show any 'normal' samples. Do you expect there should be?

       Primary Blood Derived Cancer - Bone Marrow
                                              119
  Primary Blood Derived Cancer - Peripheral Blood
                                               26
     Recurrent Blood Derived Cancer - Bone Marrow
                                               40
Recurrent Blood Derived Cancer - Peripheral Blood
                                                2
ADD REPLY
0
Entering edit mode

You're right, I mentioned both the TCGA and TARGET projects but that's because for what i need to do they are somewhat equivalent. Anyway, from what I can see on the GDC website, samples of normal peripheral blood should be available, as well as many normal bone marrow ones. I checked my version of TCGA biolinks and it's updated to the latest release. Btw I also found a bug report in the github issue section of the package, but I am not completely sure it actually is a bug. As for now my problem is not solved but i'll update the post when/if i find a solution. Many thanks!

edited for clarity

ADD REPLY

Login before adding your answer.

Traffic: 2096 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6