Entering edit mode
                    3.0 years ago
        Rob
        
    
        ▴
    
    180
    Hi all,
I used the following code to download the TCGA RNAseq data. This includes all genes. I want only protein-coding genes. Is there any code to filter for only coding genes? Thanks
query <- GDCquery(
    project = "TCGA-KIRC",
    data.category = "Transcriptome Profiling",
    data.type = "Gene Expression Quantification",
    experimental.strategy = "RNA-Seq",
    sample.type = "Primary Tumor", 
    workflow.type = "STAR - Counts")
GDCdownload(query, method = "api" )
#prepare data 
data_TCGA_STAR_KIRC <- GDCprepare(query)
########
# generate count matrix
rna_STAR <- as.data.frame(SummarizedExperiment::assay(data_TCGA_STAR_KIRC)) 
write.csv(rna_STAR, "STAR Count_expression_mRNA.csv")