HT-Seq count data coding gene
1
0
Entering edit mode
11 months ago
Rob ▴ 60

Hello friends, I want to download HT-seq data from TCGA biolink. How can I download only coding genes? what should I add to my code? this is the code I am using:

library(TCGAbiolinks)
library(SummarizedExperiment)
BiocManager::install("BioinformaticsFMRP/TCGAbiolinks")

CancerProject <- "TCGA-KIRC"
query <- GDCquery(project = CancerProject,
                  data.category = "Transcriptome Profiling",
                  data.type = "Gene Expression Quantification",
                  sample.type = c("Primary Tumor"),
                  workflow.type = "HTSeq - Counts")
#download raw counts for DESEq2
GDCdownload(query)
data <- GDCprepare(query, save = TRUE, save.filename = "exp.rda")
rna <- as.data.frame(SummarizedExperiment::assay(data)) # exp matrix
write.csv(rna, "rna.csv")
RNA-Seq • 451 views
ADD COMMENT
3
Entering edit mode

Dear rhasanvandj , As far as I know, there is no code for doing this at the download step. You need to download data and perform your analysis. Then you can select those genes you are interested in (here coding gene).

Having a list of genes you can retrieve data on their Biotype (including coding and non-coding and ...) from Ensembl by biomaRt package.

ADD REPLY
0
Entering edit mode

Thank you so much dear Hamid

ADD REPLY
3
Entering edit mode
11 months ago
Barry Digby ▴ 780

This is what Hamid Ghaedi is referring to:

## filter for protein coding genes in matrix (currently > 50,000 rows)
mart <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")
mrna_attributes <- getBM(attributes=c("external_gene_name",
                                      "ensembl_gene_id",
                                      "gene_biotype"),
                         filters = c("ensembl_gene_id"),
                         values = rownames(rna),
                         mart = mart)
mrna_attributes <- mrna_attributes[which(mrna_attributes$gene_biotype == "protein_coding"),]
rna <- rna[which(rownames(rna) %in% mrna_attributes$ensembl_gene_id),]
ADD COMMENT
0
Entering edit mode

Thanks Barry I tried this code but I got this error:

Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'rownames': object 'rna' not found
ADD REPLY
1
Entering edit mode

Then make sure you have rna file , it complained that you have not such file. See the very end of error message:

....object 'rna' not found
ADD REPLY

Login before adding your answer.

Traffic: 2091 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6