Question: HT-Seq count data coding gene
3 months ago by
Rob30 wrote:

Hello friends, I want to download HT-seq data from TCGA biolink. How can I download only coding genes? what should I add to my code? this is the code I am using:


CancerProject <- "TCGA-KIRC"
query <- GDCquery(project = CancerProject,
                  data.category = "Transcriptome Profiling",
                  data.type = "Gene Expression Quantification",
                  sample.type = c("Primary Tumor"),
                  workflow.type = "HTSeq - Counts")
#download raw counts for DESEq2
data <- GDCprepare(query, save = TRUE, save.filename = "exp.rda")
rna <- # exp matrix
write.csv(rna, "rna.csv")
rna-seq • 219 views
ADD COMMENTlink written 3 months ago by Rob30

Dear rhasanvandj , As far as I know, there is no code for doing this at the download step. You need to download data and perform your analysis. Then you can select those genes you are interested (here coding gene).

Having a list of genes you can retrieve data on their Biotype (including coding and non-coding and ...) from Ensembl by biomaRt package.

ADD REPLYlink written 3 months ago by Hamid Ghaedi1.2k

Thank you so much dear Hamid

ADD REPLYlink written 3 months ago by Rob30
3 months ago by
Barry Digby640
National University of Ireland, Galway
Barry Digby640 wrote:

This is what Hamid Ghaedi is referring to:

## filter for protein coding genes in matrix (currently > 50,000 rows)
mart <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")
mrna_attributes <- getBM(attributes=c("external_gene_name",
                         filters = c("ensembl_gene_id"),
                         values = rownames(rna),
                         mart = mart)
mrna_attributes <- mrna_attributes[which(mrna_attributes$gene_biotype == "protein_coding"),]
rna <- rna[which(rownames(rna) %in% mrna_attributes$ensembl_gene_id),]
ADD COMMENTlink modified 3 months ago • written 3 months ago by Barry Digby640

Thanks Barry I tried this code but I got this error:

Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'rownames': object 'rna' not found
ADD REPLYlink written 3 months ago by Rob30

Then make sure you have rna file , it complained that you have not such file. See the very end of error message:

....object 'rna' not found
ADD REPLYlink written 3 months ago by Hamid Ghaedi1.2k
