Hi friends
I am trying to get HT-Seq count data with TCGAbiolinks in R. what I get is S4 data and I do not know how to change it to a table to use for differential expression? Can anyone tell me please.
Thanks
Hi friends
I am trying to get HT-Seq count data with TCGAbiolinks in R. what I get is S4 data and I do not know how to change it to a table to use for differential expression? Can anyone tell me please.
Thanks
Always it is recommended to show what you have tried, your code and ....
Just to write something helpful, here are codes for downloading a cancer expression matrix and associated clinical data using TCGAbiolinks
package.
require(TCGAbiolinks)
require(SummarizedExperiment)
query <- GDCquery(project = "TCGA-BLCA", # the TCGA name for your cancer
data.category = "Gene expression",
data.type = "Gene expression quantification",
platform = "Illumina HiSeq",
file.type = "normalized_results",
experimental.strategy = "RNA-Seq",
legacy = TRUE)
GDCdownload(query, method = "api")
dat <- GDCprepare(query = query, save = TRUE, save.filename = "exp.rda")
rna <- as.data.frame(SummarizedExperiment::assay(dat)) # exp matrix
clinical <- data.frame(dat@colData) # associated clinical data
You can get access to the S4 object using the package SummarizedExperiment
, different functions. Here I used the assay
function from this package to make a data frame out of the expression matrix.
Hi Hamid
this is what I tried:
library(TCGAbiolinks)
CancerProject <- "TCGA-KIRC"
query <- GDCquery(project = CancerProject, \
data.category = "Transcriptome Profiling", \
data.type = "Gene Expression Quantification", \
workflow.type = "HTSeq - Counts")
download raw counts for DESEq2
GDCdownload(query) data <- GDCprepare(query)
@rhasanvandj, Please use "code sample" option to indicate the code line in correct format. reading the error you got from running your codes helps you to understand what is going wrong. compare your codes with what I posted for you in the above section will guide you toward writing the correct syntax for TCGAbiolinka and also other packages. In the provided answer, just replace TCGA-BLCA with TCGA-KRIC to get what you want .
You can know what you get by inspecting your sample name (column name). TCGA has its standard nomenclature system.
Tumor types range from 01 - 09, normal types from 10 - 19 and control samples from 20 - 29.
Read more here.
BTW for your problem:
Say your expression matrix name is rna
.
Do this to find what you get :
table(substr(colnames(rna),14,15))
01
indicates the primary solid tumor. After running this, try to remove what you don't like to have.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Please show all commands that you have used, and indicate the R version and operating system that you are using. Show samples of your data where possible and feasible. Thank you.
I am working with windows 10-64 bit. R version 4. 0.2 I want only primary tumor data as a table. my code: