Question

TCGAbiolinks HT-Seq count

0

Entering edit mode

4.8 years ago

Rob ▴ 180

Hi friends

I am trying to get HT-Seq count data with TCGAbiolinks in R. what I get is S4 data and I do not know how to change it to a table to use for differential expression? Can anyone tell me please.

Thanks

RNA-Seq • 2.7k views

ADD COMMENT • link updated 4.8 years ago by Hamid Ghaedi 3.3k • written 4.8 years ago by Rob ▴ 180

1

Entering edit mode

Please show all commands that you have used, and indicate the R version and operating system that you are using. Show samples of your data where possible and feasible. Thank you.

ADD REPLY • link 4.8 years ago by Kevin Blighe 89k

0

Entering edit mode

I am working with windows 10-64 bit. R version 4. 0.2 I want only primary tumor data as a table. my code:

library(TCGAbiolinks)

CancerProject <- "TCGA-KIRC"
query <- GDCquery(project = CancerProject,
                  data.category = "Transcriptome Profiling",
                  data.type = "Gene Expression Quantification", 
                  workflow.type = "HTSeq - Counts")

#download raw counts for DESEq2
GDCdownload(query)
data <- GDCprepare(query)

ADD REPLY • link updated 4.8 years ago by Kevin Blighe 89k • written 4.8 years ago by Rob ▴ 180

Ram · Answer 1 · 2020-09-12

3

Entering edit mode

4.8 years ago

Hamid Ghaedi 3.3k

Always it is recommended to show what you have tried, your code and .... Just to write something helpful, here are codes for downloading a cancer expression matrix and associated clinical data using TCGAbiolinks package.

require(TCGAbiolinks)
require(SummarizedExperiment)

query <- GDCquery(project = "TCGA-BLCA", # the TCGA name for your cancer 
                           data.category = "Gene expression",
                           data.type = "Gene expression quantification",
                           platform = "Illumina HiSeq", 
                           file.type  = "normalized_results",
                           experimental.strategy = "RNA-Seq",
                           legacy = TRUE)
GDCdownload(query, method = "api")
dat <- GDCprepare(query = query, save = TRUE, save.filename = "exp.rda")
rna <- as.data.frame(SummarizedExperiment::assay(dat)) # exp matrix
clinical <- data.frame(dat@colData) # associated clinical data

You can get access to the S4 object using the package SummarizedExperiment, different functions. Here I used the assay function from this package to make a data frame out of the expression matrix.

ADD COMMENT • link 4.8 years ago by Hamid Ghaedi 3.3k

0

Entering edit mode

Hi Hamid

this is what I tried:

library(TCGAbiolinks)

CancerProject <- "TCGA-KIRC"
query <- GDCquery(project = CancerProject, \
                  data.category = "Transcriptome Profiling", \
                  data.type = "Gene Expression Quantification", \
                  workflow.type = "HTSeq - Counts")

download raw counts for DESEq2
GDCdownload(query) data <- GDCprepare(query)

ADD REPLY • link updated 4.8 years ago by Ram 45k • written 4.8 years ago by Rob ▴ 180

1

Entering edit mode

@rhasanvandj, Please use "code sample" option to indicate the code line in correct format. reading the error you got from running your codes helps you to understand what is going wrong. compare your codes with what I posted for you in the above section will guide you toward writing the correct syntax for TCGAbiolinka and also other packages. In the provided answer, just replace TCGA-BLCA with TCGA-KRIC to get what you want .

ADD REPLY • link 4.8 years ago by Hamid Ghaedi 3.3k

0

Entering edit mode

Thanks Hamid It work now.

ADD REPLY • link 4.8 years ago by Rob ▴ 180

0

Entering edit mode

Welcome, please do not hesitate to click on the like button beside the answer :)!

ADD REPLY • link 4.8 years ago by Hamid Ghaedi 3.3k

0

Entering edit mode

Hi Hamid What do you mean by link button beside the answer?haha

I downloaded HTSeq file but these are about 600 patients. How can I filter for primary tumors and not all patient?

ADD REPLY • link 4.8 years ago by Rob ▴ 180

0

Entering edit mode

:), It's not link, it is LIKE! Anyway, to get an answer to your new question you should create new question thread. Please this time includes as much as details about your question, what you have tried and ....

ADD REPLY • link 4.8 years ago by Hamid Ghaedi 3.3k

0

Entering edit mode

I don't see any Like button in my page. don't know why

ADD REPLY • link 4.8 years ago by Rob ▴ 180

0

Entering edit mode

You don't see this?

ADD REPLY • link 4.8 years ago by Kevin Blighe 89k

0

Entering edit mode

This is what Hamid is referring to. Please accept his answer.

Upvote|Bookmark|Accept

ADD REPLY • link 4.8 years ago by Ram 45k

0

Entering edit mode

I liked your response.

ADD REPLY • link 4.8 years ago by Rob ▴ 180

0

Entering edit mode

You can know what you get by inspecting your sample name (column name). TCGA has its standard nomenclature system.

Tumor types range from 01 - 09, normal types from 10 - 19 and control samples from 20 - 29.