Question: RNA-seq data analysis
gravatar for Peter
8 weeks ago by
Peter20 wrote:

Hello everyone,

This is my first analysis of RNA-seq data. I am using the TCGAbiolinks package. Initially, I am using the "TCGA-BRCA" project and I am using samples of healthy tissue and primary tumors.

I am downloading the data in HTSeq-FPKM-UQ, which are being stored in the variable "my_data". After downloading the data, I assign the corresponding groups. The TP vector stores the IDs of patients with a primary tumor, and the NT vector stores the IDs of normal patients.

My question is whether the following steps are adequate:

dataPrep <- TCGAanalyze_Preprocessing(object = my_data, cor.cut = 0.6)
dataFilt <- TCGAanalyze_Filtering(tabDF = dataPrep,
                                  method = "quantile", 
                                  qnt.cut =  0.25)
dataDEGs <- TCGAanalyze_DEA(mat1 = dataFilt[,dataSmNT],
                            mat2 = dataFilt[,dataSmTP],
                            Cond1type = "Normal",
                            Cond2type = "Tumor",
                            fdr.cut = 0.01 ,
                            logFC.cut = 1,
                            method = "glmLRT")

After these commands, I have an output containing the logFC, p-value, FDR, and other values. I ask this question because I am not performing data normalization, as I am using the "HTSeq-FPKM-UQ" table, as I read that:

Fragments Per Kilobase of transcript per Million mapped reads upper quartile (FPKM-UQ) is a RNA-Seq-based expression normalization method. The FPKM-UQ is based on a modified version of the FPKM normalization method.

In addition, I would like to confirm that upregulated transcripts (FC greater than 1) are increased in the CTRL, applying this approach, right?

Thanks in advance!

rna-seq R • 208 views
ADD COMMENTlink modified 8 weeks ago by Hamid Ghaedi1.2k • written 8 weeks ago by Peter20
gravatar for Hamid Ghaedi
8 weeks ago by
Hamid Ghaedi1.2k
Hamid Ghaedi1.2k wrote:

For differential expression analysis, most of the packages like edgeR - TCGABiolinks uses this package for DE analysis-and Deseq2 need raw un-normalized count (HTSeq count). IF you like to read more about why you need to use raw data see edgeR and Deseq2 user guide. The following will help you to get raw count. Once you get, you good to go for the rest of your analysis.

query_TCGA = GDCquery(
  project = "TCGA-BRCA",
  data.category = "Transcriptome Profiling", # parameter enforced by GDCquery
  experimental.strategy = "RNA-Seq",
  workflow.type = "HTSeq - Counts")

GDCdownload(query = query_TCGA)

my_data <- GDCprepare(query = query_TCGA, save = TRUE, save.filename = "exp.rda")
ADD COMMENTlink modified 8 weeks ago • written 8 weeks ago by Hamid Ghaedi1.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 970 users visited in the last hour