TCGAbiolinks HT-Seq count
1
0
Entering edit mode
3.6 years ago
Rob ▴ 170

Hi friends

I am trying to get HT-Seq count data with TCGAbiolinks in R. what I get is S4 data and I do not know how to change it to a table to use for differential expression? Can anyone tell me please.

Thanks

RNA-Seq • 1.8k views
ADD COMMENT
1
Entering edit mode

Please show all commands that you have used, and indicate the R version and operating system that you are using. Show samples of your data where possible and feasible. Thank you.

ADD REPLY
0
Entering edit mode

I am working with windows 10-64 bit. R version 4. 0.2 I want only primary tumor data as a table. my code:

library(TCGAbiolinks)

CancerProject <- "TCGA-KIRC"
query <- GDCquery(project = CancerProject,
                  data.category = "Transcriptome Profiling",
                  data.type = "Gene Expression Quantification", 
                  workflow.type = "HTSeq - Counts")

#download raw counts for DESEq2
GDCdownload(query)
data <- GDCprepare(query)
ADD REPLY
3
Entering edit mode
3.6 years ago

Always it is recommended to show what you have tried, your code and .... Just to write something helpful, here are codes for downloading a cancer expression matrix and associated clinical data using TCGAbiolinks package.

require(TCGAbiolinks)
require(SummarizedExperiment)

query <- GDCquery(project = "TCGA-BLCA", # the TCGA name for your cancer 
                           data.category = "Gene expression",
                           data.type = "Gene expression quantification",
                           platform = "Illumina HiSeq", 
                           file.type  = "normalized_results",
                           experimental.strategy = "RNA-Seq",
                           legacy = TRUE)
GDCdownload(query, method = "api")
dat <- GDCprepare(query = query, save = TRUE, save.filename = "exp.rda")
rna <- as.data.frame(SummarizedExperiment::assay(dat)) # exp matrix
clinical <- data.frame(dat@colData) # associated clinical data

You can get access to the S4 object using the package SummarizedExperiment, different functions. Here I used the assay function from this package to make a data frame out of the expression matrix.

ADD COMMENT
0
Entering edit mode

Hi Hamid

this is what I tried:

library(TCGAbiolinks)

CancerProject <- "TCGA-KIRC"
query <- GDCquery(project = CancerProject, \
                  data.category = "Transcriptome Profiling", \
                  data.type = "Gene Expression Quantification", \
                  workflow.type = "HTSeq - Counts")

download raw counts for DESEq2
GDCdownload(query) data <- GDCprepare(query)
ADD REPLY
1
Entering edit mode

@rhasanvandj, Please use "code sample" option to indicate the code line in correct format. reading the error you got from running your codes helps you to understand what is going wrong. compare your codes with what I posted for you in the above section will guide you toward writing the correct syntax for TCGAbiolinka and also other packages. In the provided answer, just replace TCGA-BLCA with TCGA-KRIC to get what you want .

ADD REPLY
0
Entering edit mode

Thanks Hamid It work now.

ADD REPLY
0
Entering edit mode

Welcome, please do not hesitate to click on the like button beside the answer :)!

ADD REPLY
0
Entering edit mode

Hi Hamid What do you mean by link button beside the answer?haha

I downloaded HTSeq file but these are about 600 patients. How can I filter for primary tumors and not all patient?

ADD REPLY
0
Entering edit mode

:), It's not link, it is LIKE! Anyway, to get an answer to your new question you should create new question thread. Please this time includes as much as details about your question, what you have tried and ....

ADD REPLY
0
Entering edit mode

I don't see any Like button in my page. don't know why

ADD REPLY
0
Entering edit mode

You don't see this?

Sem-t-tulo

ADD REPLY
0
Entering edit mode

This is what Hamid is referring to. Please accept his answer.

Upvote|Bookmark|Accept

ADD REPLY
0
Entering edit mode

I liked your response.

ADD REPLY
0
Entering edit mode

You can know what you get by inspecting your sample name (column name). TCGA has its standard nomenclature system.

Tumor types range from 01 - 09, normal types from 10 - 19 and control samples from 20 - 29.

Read more here.

BTW for your problem:

Say your expression matrix name is rna.

Do this to find what you get :

table(substr(colnames(rna),14,15))

01 indicates the primary solid tumor. After running this, try to remove what you don't like to have.

ADD REPLY

Login before adding your answer.

Traffic: 1717 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6